0

I have data like --

sample 1, domain 1, value 1
sample 1, domain 2, value 1
sample 2, domain 1, value 1
sample 2, domain 3, value 1

-- stored in a dictionary --

dict_1 = {('sample 1','domain 1'): value 1, ('sample 1', 'domain 2'): value 1} 

-- etc.

Now, I have a different kind of value, named value 2 --

sample 1, domain 1, value 2
sample 1, domain 2, value 2
sample 2, domain 1, value 2
sample 2, domain 3, value 2

-- which I again put in a dictionary,

dict_2 = {('sample 1','domain 1'): value 2, ('sample 1', 'domain 2'): value 2}

How can I merge these two dictionaries in python? The keys, for instance ('sample 1', 'domain 1') are the same for both dictionaries.

I expect it to look like --

final_dict = {('sample 1', 'domain 1'): (value 1, value 2), ('sample 1', 'domain 2'): (value 1, value 2)}

-- etc.

12
  • what do you expect these two dictionaries "merged" should look like? Commented Jan 8, 2019 at 14:57
  • 1
    Possible duplicate of How to merge two dictionaries in a single expression? Commented Jan 8, 2019 at 14:58
  • @JaredSmith: Not a duplicate (of that question in any event); this one seems to want to preserve the values from both dicts, not keep the last value for a given key. Commented Jan 8, 2019 at 14:58
  • What do you mean by "merge"? What is the expected output (concrete example)?
    – Him
    Commented Jan 8, 2019 at 14:59
  • 2
    Without seeing both SPARQL queries, it's impossible to say whether a single query would be possible ... Commented Jan 8, 2019 at 15:14

4 Answers 4

2

The closest you're likely to get to this would be a dict of lists (or sets). For simplicity, you usually go with collections.defaultdict(list) so you're not constantly checking if the key already exists. You need to map to some collection type as a value because dicts have unique keys, so you need some way to group the multiple values you want to store for each key.

from collections import defaultdict

final_dict = defaultdict(list)

for d in (dict_1, dict_2):
    for k, v in d.items():
        final_dict[k].append(v)

Or equivalently with itertools.chain, you just change the loop to:

from itertools import chain

for k, v in chain(dict_1.items(), dict_2.items()):
    final_dict[k].append(v)

Side-note: If you really need it to be a proper dict at the end, and/or insist on the values being tuples rather than lists, a final pass can convert to such at the end:

final_dict = {k: tuple(v) for k, v in final_dict.items()}
4
  • This would be the cleanest way to do it without suffering from KeyErrors. +1 Commented Jan 8, 2019 at 15:07
  • The first solution seems to be working for me (strangely, the second does not). I merely need to have the values be ready for plotting against each other. Massive thanks!
    – Dymphy
    Commented Jan 8, 2019 at 15:48
  • @Dymphy: To be clear, the second block should still include the import for defaultdict and the initialization of the empty final_dict from the first block. I omitted them for brevity, but final_dict still needs to be a defaultdict(list) in both cases. The second block is just showing a way to reduce the level of loop nesting. If adding the imports and initial definition of final_dict doesn't make the second loop work, let me know (and provide the error), because it should be exactly equivalent. Commented Jan 8, 2019 at 17:41
  • Yes, that was indeed the problem. Thank you for your help!
    – Dymphy
    Commented Jan 9, 2019 at 16:07
1

You can use set intersection of keys to do this:

dict_1 = {('sample 1','domain 1'): 'value 1', ('sample 1', 'domain 2'): 'value 1'} 
dict_2 = {('sample 1','domain 1'): 'value 2', ('sample 1', 'domain 2'): 'value 2'} 

result = {k: (dict_1.get(k), dict_2.get(k)) for k in dict_1.keys() & dict_2.keys()}

print(result)
# {('sample 1', 'domain 1'): ('value 1', 'value 2'), ('sample 1', 'domain 2'): ('value 1', 'value 2')}

The above uses dict.get() to avoid possibilities of a KeyError being raised(very unlikely), since it will just return None by default.

As @ShadowRanger suggests in the comments, If a key is for some reason not found, you could replace from the opposite dictionary:

{k: (dict_1.get(k, dict_2.get(k)), dict_2.get(k, dict_1.get(k))) for k in dict_1.keys() | dict_2.keys()}
1
  • 1
    @RoadRunner: Or for amusement (or seriously for some really esoteric scenarios), you could make each get's default the value from the other dict, so if only one dict has the key, you get a tuple with the same value twice: {k: (dict_1.get(k, dict_2.get(k)), dict_2.get(k, dict_1.get(k))) for k in dict_1.keys() | dict_2.keys()} :-) A little wasteful, but harmless. Commented Jan 8, 2019 at 15:09
0

Does something handcrafted like this work for you?

dict3 = {} 
for i in dict1: 
    dict3[i] = (dict1[i], dict2[i]) 
1
  • Sorry for that. Have you tried to replace dict1 and dict2 with your dictionaries name? Are you sure that the two dictionaries have the same keys?
    – jimifiki
    Commented Jan 8, 2019 at 16:43
-1
from collections import defaultdict
from itertools import chain
dict_1 = {('sample 1','domain 1'): 1, ('sample 1', 'domain 2'): 2} 
dict_2 = {('sample 1','domain 1'): 3, ('sample 1', 'domain 2'): 4}

new_dict_to_process = defaultdict(list)
dict_list=[dict_1.items(),dict_2.items()]
for k,v in chain(*dict_list):
     new_dict_to_process[k].append(v)

Output will be

{('sample 1', 'domain 1'): [1, 3],
 ('sample 1', 'domain 2'): [2, 4]})
1
  • That's not how chain works... The nested loop structure makes no sense at all here. Commented Jan 8, 2019 at 15:05

Not the answer you're looking for? Browse other questions tagged or ask your own question.