Merge a list of deeply nested HashMaps

Question

Suppose I have a list of hashmaps (<String, Object>). Values can be either scalar types or a nested HashMap with the same type.

I need to merge a list of this maps into a final HashMap, each map in the list has higher precedence than the one before. Also the merge has to be deep: i.e.:

map1   = {root: {a: a, b: {c: c, d: d}}}
map2   = {root: {a: a, b: {c: x}}
result = {root: {a: a, b: {c: x, d: d}}}

In the previous example, the sub tree b in map2 does not override the whole subtree in map1. It just overrides the leaves that are duplicated.

The code I have right now looks like this and works ok:

public static Map<String, Object> merge(List<Map<String, Object>> configs) {
    return configs.stream()
            .filter(Objects::nonNull)
            .reduce(new LinkedHashMap<>(), ConfigMerger::mergeMaps);
}

public static Map<String, Object> mergeMaps(Map<String, Object> map, Map<String, Object> other) {
    Map<String, Object> result = new LinkedHashMap<>(map.size() + other.size());
    result.putAll(map);

    for (Map.Entry<String, Object> entry : other.entrySet()) {
        String key = entry.getKey();
        Object val2 = entry.getValue();
        Object val1 = result.get(key);

        if (val1 instanceof Map && val2 instanceof Map) {
            Map<String, Object> nestedResult = mergeMaps((Map<String, Object>) val1, (Map<String, Object>) val2);
            result.put(key, nestedResult);
        } else {
            result.put(key, val2);
        }
    }

    return result;
}

The only issue is that result.putAll(map); is a bit wasteful. I do that for every nested map in each one of the original maps.

I tried a slightly more complicated approach, which is around 20% faster, and yet the most expensive method is the HashSet.addAll:

// (assumes the list in reversed order)
public static Map<String, Object> merge2(List<Map<String, Object>> configs) {
    Map<String, Object> result = new HashMap<>();
    Set<String> allKeys = new HashSet<>();
    for (Map<String, Object> config : configs) {
        allKeys.addAll(config.keySet());
    }
    for (String key : allKeys) {
        if (configs.get(0).get(key) instanceof Map<?, ?>) {
            List<Map<String, Object>> nestedConfigs = new ArrayList<>();
            for (Map<String, Object> config : configs) {
                if (config.get(key) instanceof Map<?, ?>) {
                    nestedConfigs.add((Map<String, Object>) config.get(key));
                }
            }
            result.put(key, merge2(nestedConfigs));
        } else {
            for (Map<String, Object> config : configs) {
                if (config.containsKey(key)) {
                    result.put(key, config.get(key));
                    break;
                }
            }
        }
    }
    return result;
}

Is there any other approach I could try that could help me squeeze a bit more of performance out of this?

For context, the maps are usually huge (hundreds of subtrees) and the nesting level up to fifty sometimes.

On SO: How to efficiently merge a list of deeply nested HashMaps? — greybeard, Commented Jun 8, 2023 at 20:27
Please do not edit the question, especially the code, after an answer has been posted. Changing the question may cause answer invalidation. Everyone needs to be able to see what the reviewer was referring to. What to do after the question has been answered. — pacmaninbw, Commented Jun 9, 2023 at 0:04

J_H · Accepted Answer · 2023-06-08 01:02:28Z

2

any other approach ... that could help me squeeze a bit more performance ?

Choose a nice output format for your maps, and serialize each one to a text file (or to an RDBMS). For example the original result might come out as:

a:    a
b.c:  x
b.d:  d

Choose a stable sort, and perform an external merge sort of all files. For example, /usr/bin/sort -s springs to mind.

Uniquify duplicate keys. Deserialize the result into a java Map.

Certainly there are ways to accomplish this in RAM, without ever leaving the JVM.

edited Jun 8, 2023 at 1:02

answered Jun 8, 2023 at 0:56

J_H

30.8k3 gold badges30 silver badges119 bronze badges

\$\begingroup\$ Yeah, ideally I would want to do something where I do not leave the JVM. For context, we run these config merges in the order of around 5 thousand per second. \$\endgroup\$
– Cristian
Commented Jun 8, 2023 at 6:10
\$\begingroup\$ You explain that you wish to process maps that are "huge" in < 200 μsec. That suggests an incremental approach. Yet you're not offering a Δ map of changed keys. Seems like a disconnect. We desire O(Δ) effort yet the input requires that we scan O(N) entries. Oh, well. \$\endgroup\$
– J_H
Commented Jun 8, 2023 at 6:46

Add a comment |

greybeard · Accepted Answer · 2023-06-09 07:33:29Z

You present uncommented code.
While I might succeed figuring out what mergeMaps() achieves, it may be intended to do something slightly different:
- Document you code. In the code.
  Use "Javadoc style" comments to document what everything intended for external use is there for.
I don't think merge2() even works as intended: with
```
map1 = {root: {a: a, b: {c: c, d: d}}},
map2 = {root: {b: {c: x}},
map3 = {root: {a: a, d: e}
```
, wouldn't b go un-merged due to not appearing in map3 (reversed order configs.get(0))?

if val2 isn't a Map, you never use val1:

      Object val1 = null;

      if (val2 instanceof Map && (val1 = result.get(key)) instanceof Map)
          val2 = mergeMaps((Map<String, Object>) val1, (Map<String, Object>) val2);
      result.put(key, val2);

In the end, every result value that is a Map will "need to have been merged", so where does mergeMaps() invest futile effort?
I think it is in merging Maps that don't end up in the result as well as in
instantiating a new Map+copy for every Map merged in.
Trying to avoid both using Nat's class Reversed:

// When merging maps in a list with later maps taking precedence
//  and a non-map value replacing any resulting map so far,
// there is one catch to processing in reverse:
// one needs to tell *should be merged* from isCompleted
static class CompletableMap<K, V> extends LinkedHashMap<K, V> {
    private boolean completed = false;
    void       complete() { completed = true; }
    boolean isCompleted() { return completed; }

    CompletableMap() { super(); }
    CompletableMap(int initialCapacity) { super(initialCapacity); }
    CompletableMap(Map<? extends K, ? extends V> m) { super(m); }
}

/** merges <code>configurations</code>, later entries taking precedence. */
public static Map<String, Object> 
merge(List<Map<String, Object>> configurations) {
    Map<String, Object> configuration = new LinkedHashMap<>();
    if (null != configurations) {
        // for (ListIterator<Object> configs = configurations.listIterator(configurations.size()) ; configs.hasPrevious() ; )
        //    Object previous = configs.previous();
        for (Object previous : Reversed.reversed(configurations))
            mergeMaps(configuration, previous);
    }
    return configuration;
}

/** merges <code>accumulating</code>
 *     and <code>previousConfiguration</code>,
 *   entries in <code>accumulating</code> taking precedence.
 *///public 
static void mergeMaps(Map<String, Object> accumulating, 
                      Map<String, Object> previousConfiguration) {
    for (Map.Entry<String, Object> 
         previous : previousConfiguration.entrySet()) {
        String key = previous.getKey();
        Object
            accumulated = accumulating.get(key),
            previousValue = previous.getValue();
        if (previousValue instanceof Map) {
            Map<String, Object> 
                previousMap = (Map<String, Object>) previousValue;
            if (null == accumulated)
                accumulating.put(key,
                   new CompletableMap<String, Object>(previousMap));
            else if (accumulated instanceof Map
                     && !((CompletableMap<String, Object>)accumulated)
                          .isCompleted())
                mergeMaps((Map<String, Object>) accumulated, previousMap);
        } else if (null == accumulated)
            accumulating.put(key, previousValue);
        else if (accumulated instanceof CompletableMap)
            ((CompletableMap<String, Object>)accumulated).complete();
    }
}

public static void main(String []args) {
    List<Map<String, Object>> c = List.of(
        Map.of("a", "1"),
        Map.of("b", Map.of("c", "2", "d", "3")),
        Map.of("b", "4"),
        Map.of("b", Map.of("c", "5", "e", "6")),
        Map.of("b", Map.of("c", "7")),
        Map.of("b", Map.of("c", "8")));
    System.out.println(merge(c));
}

Caveat: Untested, if tried out.
Caught unawares: top level entries are in reverse order.

From Java 19, CompletableMap(int initialCapacity) should probably call LinkedHashMap.newLinkedHashMap(initialCapacity). — greybeard, Commented Jun 8, 2023 at 10:09
First of all, thanks for taking the time to write a respond. Let me answer some of your small questions here, but expand on the original post after I edit it: 1. The code I pasted does not have comments because I removed them, as they are not that helpful. I will not only add them in the edit, but include the original code (which is Kotlin and not Java) 2. I know merge2 does not work as-is. That's why I added a comment explaining that a reversal of the list must be in place. 3. About the reversed thing... yeah, I get it and I tried something similar but the performance gains are minimal. — Cristian, Commented Jun 8, 2023 at 20:12
expand on the original post after I edit it (Heeding What should I not do when someone answers my question?, hopefully.) — greybeard, Commented Jun 8, 2023 at 20:16
Fair. In my defense, the code I pasted, even though it's "better"... it still does not satisfy the original premise: how to efficiently merge a list of deeply nested maps. — Cristian, Commented Jun 8, 2023 at 20:21
When asking a question about a Kotlin implementation of a solution to the same problem, provide more context. What are those 5000 configurations per second used for? There may be an approach less resource hungry than creating a merged Map. Are you aware of the concept of a Map with backup-Map delegating lookups to another Map for unmapped keys, possibly caching results? — greybeard, Commented Jun 9, 2023 at 7:50

Stack Exchange Network

Merge a list of deeply nested HashMaps

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
java
performance
algorithm
or ask your own question.

Hot Network Questions

Merge a list of deeply nested HashMaps

2 Answers 2

Not the answer you're looking for? Browse other questions tagged javaperformancealgorithm or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
java
performance
algorithm
or ask your own question.