I want to merge two dictionaries into a new dictionary.
x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}
z = merge(x, y)
>>> z
{'a': 1, 'b': 3, 'c': 4}
Whenever a key k
is present in both dictionaries, only the value y[k]
should be kept.
>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> x, z = dict(x), x.update(y) or x
>>> x
{'a': 1, 'b': 2}
>>> y
{'c': 11, 'b': 10}
>>> z
{'a': 1, 'c': 11, 'b': 10}
x
with its copy. If x
is a function argument this won't work (see example)
Commented
Feb 22, 2019 at 9:27
In Python 3.9
Based on PEP 584, the new version of Python introduces two new operators for dictionaries: union (|) and in-place union (|=). You can use | to merge two dictionaries, while |= will update a dictionary in place:
>>> pycon = {2016: "Portland", 2018: "Cleveland"}
>>> europython = {2017: "Rimini", 2018: "Edinburgh", 2019: "Basel"}
>>> pycon | europython
{2016: 'Portland', 2018: 'Edinburgh', 2017: 'Rimini', 2019: 'Basel'}
>>> pycon |= europython
>>> pycon
{2016: 'Portland', 2018: 'Edinburgh', 2017: 'Rimini', 2019: 'Basel'}
If d1 and d2 are two dictionaries, then d1 | d2
does the same as {**d1, **d2}
. The | operator is used for calculating the union of sets, so the notation may already be familiar to you.
One advantage of using |
is that it works on different dictionary-like types and keeps the type through the merge:
>>> from collections import defaultdict
>>> europe = defaultdict(lambda: "", {"Norway": "Oslo", "Spain": "Madrid"})
>>> africa = defaultdict(lambda: "", {"Egypt": "Cairo", "Zimbabwe": "Harare"})
>>> europe | africa
defaultdict(<function <lambda> at 0x7f0cb42a6700>,
{'Norway': 'Oslo', 'Spain': 'Madrid', 'Egypt': 'Cairo', 'Zimbabwe': 'Harare'})
>>> {**europe, **africa}
{'Norway': 'Oslo', 'Spain': 'Madrid', 'Egypt': 'Cairo', 'Zimbabwe': 'Harare'}
You can use a defaultdict when you want to effectively handle missing keys. Note that |
preserves the defaultdict, while {**europe, **africa}
does not.
There are some similarities between how |
works for dictionaries and how +
works for lists. In fact, the +
operator was originally proposed to merge dictionaries as well. This correspondence becomes even more evident when you look at the in-place operator.
The basic use of |=
is to update a dictionary in place, similar to .update()
:
>>> libraries = {
... "collections": "Container datatypes",
... "math": "Mathematical functions",
... }
>>> libraries |= {"zoneinfo": "IANA time zone support"}
>>> libraries
{'collections': 'Container datatypes', 'math': 'Mathematical functions',
'zoneinfo': 'IANA time zone support'}
When you merge dictionaries with |
, both dictionaries need to be of a proper dictionary type. On the other hand, the in-place operator (|=
) is happy to work with any dictionary-like data structure:
>>> libraries |= [("graphlib", "Functionality for graph-like structures")]
>>> libraries
{'collections': 'Container datatypes', 'math': 'Mathematical functions',
'zoneinfo': 'IANA time zone support',
'graphlib': 'Functionality for graph-like structures'}
Python 3.9+ only
Merge (|) and update (|=) operators have been added to the built-in dict
class.
>>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
>>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> d | e
{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
The augmented assignment version operates in-place:
>>> d |= e
>>> d
{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
See PEP 584
I know this does not really fit the specifics of the questions ("one liner"), but since none of the answers above went into this direction while lots and lots of answers addressed the performance issue, I felt I should contribute my thoughts.
Depending on the use case it might not be necessary to create a "real" merged dictionary of the given input dictionaries. A view which does this might be sufficient in many cases, i. e. an object which acts like the merged dictionary would without computing it completely. A lazy version of the merged dictionary, so to speak.
In Python, this is rather simple and can be done with the code shown at the end of my post. This given, the answer to the original question would be:
z = MergeDict(x, y)
When using this new object, it will behave like a merged dictionary but it will have constant creation time and constant memory footprint while leaving the original dictionaries untouched. Creating it is way cheaper than in the other solutions proposed.
Of course, if you use the result a lot, then you will at some point reach the limit where creating a real merged dictionary would have been the faster solution. As I said, it depends on your use case.
If you ever felt you would prefer to have a real merged dict
, then calling dict(z)
would produce it (but way more costly than the other solutions of course, so this is just worth mentioning).
You can also use this class to make a kind of copy-on-write dictionary:
a = { 'x': 3, 'y': 4 }
b = MergeDict(a) # we merge just one dict
b['x'] = 5
print b # will print {'x': 5, 'y': 4}
print a # will print {'y': 4, 'x': 3}
Here's the straight-forward code of MergeDict
:
class MergeDict(object):
def __init__(self, *originals):
self.originals = ({},) + originals[::-1] # reversed
def __getitem__(self, key):
for original in self.originals:
try:
return original[key]
except KeyError:
pass
raise KeyError(key)
def __setitem__(self, key, value):
self.originals[0][key] = value
def __iter__(self):
return iter(self.keys())
def __repr__(self):
return '%s(%s)' % (
self.__class__.__name__,
', '.join(repr(original)
for original in reversed(self.originals)))
def __str__(self):
return '{%s}' % ', '.join(
'%r: %r' % i for i in self.iteritems())
def iteritems(self):
found = set()
for original in self.originals:
for k, v in original.iteritems():
if k not in found:
yield k, v
found.add(k)
def items(self):
return list(self.iteritems())
def keys(self):
return list(k for k, _ in self.iteritems())
def values(self):
return list(v for _, v in self.iteritems())
ChainMap
which is available in Python 3 only and which does more or less what my code does. So shame on me for not reading everything carefully enough. But given that this only exists for Python 3, please take my answer as a contribution for the Python 2 users ;-)
This is an expression for Python 3.5 or greater that merges dictionaries using reduce
:
>>> from functools import reduce
>>> l = [{'a': 1}, {'b': 2}, {'a': 100, 'c': 3}]
>>> reduce(lambda x, y: {**x, **y}, l, {})
{'a': 100, 'b': 2, 'c': 3}
Note: this works even if the dictionary list is empty or contains only one element.
For a more efficient merge on Python 3.9 or greater, the lambda
can be replaced directly by operator.ior
:
>>> from functools import reduce
>>> from operator import ior
>>> l = [{'a': 1}, {'b': 2}, {'a': 100, 'c': 3}]
>>> reduce(ior, l, {})
{'a': 100, 'b': 2, 'c': 3}
For Python 3.8 or less, the following can be used as an alternative to ior
:
>>> from functools import reduce
>>> l = [{'a': 1}, {'b': 2}, {'a': 100, 'c': 3}]
>>> reduce(lambda x, y: x.update(y) or x, l, {})
{'a': 100, 'b': 2, 'c': 3}
operator.ior
instead.
Using a dict comprehension, you may
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
dc = {xi:(x[xi] if xi not in list(y.keys())
else y[xi]) for xi in list(x.keys())+(list(y.keys()))}
gives
>>> dc
{'a': 1, 'c': 11, 'b': 10}
Note the syntax for if else
in comprehension
{ (some_key if condition else default_key):(something_if_true if condition
else something_if_false) for key, value in dict_.items() }
... in list(y.keys())
instead of just ... in y
.
A union of the OP's two dictionaries would be something like:
{'a': 1, 'b': 2, 10, 'c': 11}
Specifically, the union of two entities(x
and y
) contains all the elements of x
and/or y
.
Unfortunately, what the OP asks for is not a union, despite the title of the post.
My code below is neither elegant nor a one-liner, but I believe it is consistent with the meaning of union.
From the OP's example:
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
z = {}
for k, v in x.items():
if not k in z:
z[k] = [(v)]
else:
z[k].append((v))
for k, v in y.items():
if not k in z:
z[k] = [(v)]
else:
z[k].append((v))
{'a': [1], 'b': [2, 10], 'c': [11]}
Whether one wants lists could be changed, but the above will work if a dictionary contains lists (and nested lists) as values in either dictionary.
union
, for clarity.
Commented
Sep 30, 2014 at 15:49
You can use toolz.merge([x, y])
for this.
I was curious if I could beat the accepted answer's time with a one line stringify approach:
I tried 5 methods, none previously mentioned - all one liner - all producing correct answers - and I couldn't come close.
So... to save you the trouble and perhaps fulfill curiosity:
import json
import yaml
import time
from ast import literal_eval as literal
def merge_two_dicts(x, y):
z = x.copy() # start with x's keys and values
z.update(y) # modifies z with y's keys and values & returns None
return z
x = {'a':1, 'b': 2}
y = {'b':10, 'c': 11}
start = time.time()
for i in range(10000):
z = yaml.load((str(x)+str(y)).replace('}{',', '))
elapsed = (time.time()-start)
print (elapsed, z, 'stringify yaml')
start = time.time()
for i in range(10000):
z = literal((str(x)+str(y)).replace('}{',', '))
elapsed = (time.time()-start)
print (elapsed, z, 'stringify literal')
start = time.time()
for i in range(10000):
z = eval((str(x)+str(y)).replace('}{',', '))
elapsed = (time.time()-start)
print (elapsed, z, 'stringify eval')
start = time.time()
for i in range(10000):
z = {k:int(v) for k,v in (dict(zip(
((str(x)+str(y))
.replace('}',' ')
.replace('{',' ')
.replace(':',' ')
.replace(',',' ')
.replace("'",'')
.strip()
.split(' '))[::2],
((str(x)+str(y))
.replace('}',' ')
.replace('{',' ').replace(':',' ')
.replace(',',' ')
.replace("'",'')
.strip()
.split(' '))[1::2]
))).items()}
elapsed = (time.time()-start)
print (elapsed, z, 'stringify replace')
start = time.time()
for i in range(10000):
z = json.loads(str((str(x)+str(y)).replace('}{',', ').replace("'",'"')))
elapsed = (time.time()-start)
print (elapsed, z, 'stringify json')
start = time.time()
for i in range(10000):
z = merge_two_dicts(x, y)
elapsed = (time.time()-start)
print (elapsed, z, 'accepted')
results:
7.693928956985474 {'c': 11, 'b': 10, 'a': 1} stringify yaml
0.29134678840637207 {'c': 11, 'b': 10, 'a': 1} stringify literal
0.2208399772644043 {'c': 11, 'b': 10, 'a': 1} stringify eval
0.1106564998626709 {'c': 11, 'b': 10, 'a': 1} stringify replace
0.07989692687988281 {'c': 11, 'b': 10, 'a': 1} stringify json
0.005082368850708008 {'c': 11, 'b': 10, 'a': 1} accepted
What I did learn from this is that JSON approach is the fastest way (of those attempted) to return a dictionary from string-of-dictionary; much faster (about 1/4th of the time) of what I considered to be the normal method using ast
. I also learned that, the YAML approach should be avoided at all cost.
Yes, I understand that this is not the best/correct way. I was curious if it was faster, and it isn't; I posted to prove it so.
json
approach is faster than ast.literal_eval
, but it's also not as comprehensive. It can't handle Python literals not in the JSON spec, so no tuple
s, set
s, frozenset
s, bool
s (it can handle JSON bools, but not the result of stringifying a Python bool directly), etc. ast.literal_eval
is slower, but at least some of that is a consequence of handling more complex inputs. That said, I'm pretty sure it could be faster if they bothered to optimize it, it's just pretty rare that evaluating strings of Python literals is the chokepoint in code.
Commented
Feb 28, 2019 at 17:29
I think my ugly one-liners are just necessary here.
z = next(z.update(y) or z for z in [x.copy()])
# or
z = (lambda z: z.update(y) or z)(x.copy())
P.S. This is a solution working in both versions of Python. I know that Python 3 has this {**x, **y}
thing and it is the right thing to use (as well as moving to Python 3 if you still have Python 2 is the right thing to do).
A method is deep merging. Making use of the |
operator in 3.9+ for the use case of dict new
being a set of default settings, and dict existing
being a set of existing settings in use. My goal was to merge in any added settings from new
without over writing existing settings in existing
. I believe this recursive implementation will allow one to upgrade a dict with new values from another dict.
def merge_dict_recursive(new: dict, existing: dict):
merged = new | existing
for k, v in merged.items():
if isinstance(v, dict):
if k not in existing:
# The key is not in existing dict at all, so add entire value
existing[k] = new[k]
merged[k] = merge_dict_recursive(new[k], existing[k])
return merged
Example test data:
new
{'dashboard': True,
'depth': {'a': 1, 'b': 22222, 'c': {'d': {'e': 69}}},
'intro': 'this is the dashboard',
'newkey': False,
'show_closed_sessions': False,
'version': None,
'visible_sessions_limit': 9999}
existing
{'dashboard': True,
'depth': {'a': 5},
'intro': 'this is the dashboard',
'newkey': True,
'show_closed_sessions': False,
'version': '2021-08-22 12:00:30.531038+00:00'}
merged
{'dashboard': True,
'depth': {'a': 5, 'b': 22222, 'c': {'d': {'e': 69}}},
'intro': 'this is the dashboard',
'newkey': True,
'show_closed_sessions': False,
'version': '2021-08-22 12:00:30.531038+00:00',
'visible_sessions_limit': 9999}
The question is tagged python-3x
but, taking into account that it's a relatively recent addition and that the most voted, accepted answer deals extensively with a Python 2.x solution, I dare add a one liner that draws on an irritating feature of Python 2.x list comprehension, that is name leaking...
$ python2
Python 2.7.13 (default, Jan 19 2017, 14:48:08)
[GCC 6.3.0 20170118] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> [z.update(d) for z in [{}] for d in (x, y)]
[None, None]
>>> z
{'a': 1, 'c': 11, 'b': 10}
>>> ...
I'm happy to say that the above doesn't work any more on any version of Python 3.
Deep merge of dicts:
from typing import List, Dict
from copy import deepcopy
def merge_dicts(*from_dicts: List[Dict], no_copy: bool=False) -> Dict :
""" no recursion deep merge of two dicts
By default creates fresh Dict and merges all to it.
no_copy = True, will merge all dicts to a fist one in a list without copy.
Why? Sometime I need to combine one dictionary from "layers".
The "layers" are not in use and dropped immediately after merging.
"""
if no_copy:
xerox = lambda x:x
else:
xerox = deepcopy
result = xerox(from_dicts[0])
for _from in from_dicts[1:]:
merge_queue = [(result, _from)]
for _to, _from in merge_queue:
for k, v in _from.items():
if k in _to and isinstance(_to[k], dict) and isinstance(v, dict):
# key collision add both are dicts.
# add to merging queue
merge_queue.append((_to[k], v))
continue
_to[k] = xerox(v)
return result
Usage:
print("=============================")
print("merge all dicts to first one without copy.")
a0 = {"a":{"b":1}}
a1 = {"a":{"c":{"d":4}}}
a2 = {"a":{"c":{"f":5}, "d": 6}}
print(f"a0 id[{id(a0)}] value:{a0}")
print(f"a1 id[{id(a1)}] value:{a1}")
print(f"a2 id[{id(a2)}] value:{a2}")
r = merge_dicts(a0, a1, a2, no_copy=True)
print(f"r id[{id(r)}] value:{r}")
print("=============================")
print("create fresh copy of all")
a0 = {"a":{"b":1}}
a1 = {"a":{"c":{"d":4}}}
a2 = {"a":{"c":{"f":5}, "d": 6}}
print(f"a0 id[{id(a0)}] value:{a0}")
print(f"a1 id[{id(a1)}] value:{a1}")
print(f"a2 id[{id(a2)}] value:{a2}")
r = merge_dicts(a0, a1, a2)
print(f"r id[{id(r)}] value:{r}")