4
\$\begingroup\$

Suppose I have a list of hashmaps (<String, Object>). Values can be either scalar types or a nested HashMap with the same type.

I need to merge a list of this maps into a final HashMap, each map in the list has higher precedence than the one before. Also the merge has to be deep: i.e.:

map1   = {root: {a: a, b: {c: c, d: d}}}
map2   = {root: {a: a, b: {c: x}}
result = {root: {a: a, b: {c: x, d: d}}}

In the previous example, the sub tree b in map2 does not override the whole subtree in map1. It just overrides the leaves that are duplicated.

The code I have right now looks like this and works ok:

public static Map<String, Object> merge(List<Map<String, Object>> configs) {
    return configs.stream()
            .filter(Objects::nonNull)
            .reduce(new LinkedHashMap<>(), ConfigMerger::mergeMaps);
}

public static Map<String, Object> mergeMaps(Map<String, Object> map, Map<String, Object> other) {
    Map<String, Object> result = new LinkedHashMap<>(map.size() + other.size());
    result.putAll(map);

    for (Map.Entry<String, Object> entry : other.entrySet()) {
        String key = entry.getKey();
        Object val2 = entry.getValue();
        Object val1 = result.get(key);

        if (val1 instanceof Map && val2 instanceof Map) {
            Map<String, Object> nestedResult = mergeMaps((Map<String, Object>) val1, (Map<String, Object>) val2);
            result.put(key, nestedResult);
        } else {
            result.put(key, val2);
        }
    }

    return result;
}

The only issue is that result.putAll(map); is a bit wasteful. I do that for every nested map in each one of the original maps.

I tried a slightly more complicated approach, which is around 20% faster, and yet the most expensive method is the HashSet.addAll:

// (assumes the list in reversed order)
public static Map<String, Object> merge2(List<Map<String, Object>> configs) {
    Map<String, Object> result = new HashMap<>();
    Set<String> allKeys = new HashSet<>();
    for (Map<String, Object> config : configs) {
        allKeys.addAll(config.keySet());
    }
    for (String key : allKeys) {
        if (configs.get(0).get(key) instanceof Map<?, ?>) {
            List<Map<String, Object>> nestedConfigs = new ArrayList<>();
            for (Map<String, Object> config : configs) {
                if (config.get(key) instanceof Map<?, ?>) {
                    nestedConfigs.add((Map<String, Object>) config.get(key));
                }
            }
            result.put(key, merge2(nestedConfigs));
        } else {
            for (Map<String, Object> config : configs) {
                if (config.containsKey(key)) {
                    result.put(key, config.get(key));
                    break;
                }
            }
        }
    }
    return result;
}

Is there any other approach I could try that could help me squeeze a bit more of performance out of this?

For context, the maps are usually huge (hundreds of subtrees) and the nesting level up to fifty sometimes.

\$\endgroup\$
2

2 Answers 2

2
\$\begingroup\$

any other approach ... that could help me squeeze a bit more performance ?

Choose a nice output format for your maps, and serialize each one to a text file (or to an RDBMS). For example the original result might come out as:

a:    a
b.c:  x
b.d:  d

Choose a stable sort, and perform an external merge sort of all files. For example, /usr/bin/sort -s springs to mind.

Uniquify duplicate keys. Deserialize the result into a java Map.


Certainly there are ways to accomplish this in RAM, without ever leaving the JVM.

\$\endgroup\$
2
  • \$\begingroup\$ Yeah, ideally I would want to do something where I do not leave the JVM. For context, we run these config merges in the order of around 5 thousand per second. \$\endgroup\$
    – Cristian
    Commented Jun 8, 2023 at 6:10
  • \$\begingroup\$ You explain that you wish to process maps that are "huge" in < 200 μsec. That suggests an incremental approach. Yet you're not offering a Δ map of changed keys. Seems like a disconnect. We desire O(Δ) effort yet the input requires that we scan O(N) entries. Oh, well. \$\endgroup\$
    – J_H
    Commented Jun 8, 2023 at 6:46
1
\$\begingroup\$
  • You present uncommented code.
    While I might succeed figuring out what mergeMaps() achieves, it may be intended to do something slightly different:

    • Document you code. In the code.
      Use "Javadoc style" comments to document what everything intended for external use is there for.
  • I don't think merge2() even works as intended: with

    map1 = {root: {a: a, b: {c: c, d: d}}},
    map2 = {root: {b: {c: x}},
    map3 = {root: {a: a, d: e}
    

    , wouldn't b go un-merged due to not appearing in map3 (reversed order configs.get(0))?

  • if val2 isn't a Map, you never use val1:

          Object val1 = null;
    
          if (val2 instanceof Map && (val1 = result.get(key)) instanceof Map)
              val2 = mergeMaps((Map<String, Object>) val1, (Map<String, Object>) val2);
          result.put(key, val2);
    

In the end, every result value that is a Map will "need to have been merged", so where does mergeMaps() invest futile effort?
I think it is in merging Maps that don't end up in the result as well as in
instantiating a new Map+copy for every Map merged in.
Trying to avoid both using Nat's class Reversed:

// When merging maps in a list with later maps taking precedence
//  and a non-map value replacing any resulting map so far,
// there is one catch to processing in reverse:
// one needs to tell *should be merged* from isCompleted
static class CompletableMap<K, V> extends LinkedHashMap<K, V> {
    private boolean completed = false;
    void       complete() { completed = true; }
    boolean isCompleted() { return completed; }

    CompletableMap() { super(); }
    CompletableMap(int initialCapacity) { super(initialCapacity); }
    CompletableMap(Map<? extends K, ? extends V> m) { super(m); }
}

/** merges <code>configurations</code>, later entries taking precedence. */
public static Map<String, Object> 
merge(List<Map<String, Object>> configurations) {
    Map<String, Object> configuration = new LinkedHashMap<>();
    if (null != configurations) {
        // for (ListIterator<Object> configs = configurations.listIterator(configurations.size()) ; configs.hasPrevious() ; )
        //    Object previous = configs.previous();
        for (Object previous : Reversed.reversed(configurations))
            mergeMaps(configuration, previous);
    }
    return configuration;
}

/** merges <code>accumulating</code>
 *     and <code>previousConfiguration</code>,
 *   entries in <code>accumulating</code> taking precedence.
 *///public 
static void mergeMaps(Map<String, Object> accumulating, 
                      Map<String, Object> previousConfiguration) {
    for (Map.Entry<String, Object> 
         previous : previousConfiguration.entrySet()) {
        String key = previous.getKey();
        Object
            accumulated = accumulating.get(key),
            previousValue = previous.getValue();
        if (previousValue instanceof Map) {
            Map<String, Object> 
                previousMap = (Map<String, Object>) previousValue;
            if (null == accumulated)
                accumulating.put(key,
                   new CompletableMap<String, Object>(previousMap));
            else if (accumulated instanceof Map
                     && !((CompletableMap<String, Object>)accumulated)
                          .isCompleted())
                mergeMaps((Map<String, Object>) accumulated, previousMap);
        } else if (null == accumulated)
            accumulating.put(key, previousValue);
        else if (accumulated instanceof CompletableMap)
            ((CompletableMap<String, Object>)accumulated).complete();
    }
}

public static void main(String []args) {
    List<Map<String, Object>> c = List.of(
        Map.of("a", "1"),
        Map.of("b", Map.of("c", "2", "d", "3")),
        Map.of("b", "4"),
        Map.of("b", Map.of("c", "5", "e", "6")),
        Map.of("b", Map.of("c", "7")),
        Map.of("b", Map.of("c", "8")));
    System.out.println(merge(c));
}

Caveat: Untested, if tried out.
Caught unawares: top level entries are in reverse order.

\$\endgroup\$
6
  • \$\begingroup\$ From Java 19, CompletableMap(int initialCapacity) should probably call LinkedHashMap.newLinkedHashMap(initialCapacity). \$\endgroup\$
    – greybeard
    Commented Jun 8, 2023 at 10:09
  • \$\begingroup\$ First of all, thanks for taking the time to write a respond. Let me answer some of your small questions here, but expand on the original post after I edit it: 1. The code I pasted does not have comments because I removed them, as they are not that helpful. I will not only add them in the edit, but include the original code (which is Kotlin and not Java) 2. I know merge2 does not work as-is. That's why I added a comment explaining that a reversal of the list must be in place. 3. About the reversed thing... yeah, I get it and I tried something similar but the performance gains are minimal. \$\endgroup\$
    – Cristian
    Commented Jun 8, 2023 at 20:12
  • \$\begingroup\$ expand on the original post after I edit it (Heeding What should I not do when someone answers my question?, hopefully.) \$\endgroup\$
    – greybeard
    Commented Jun 8, 2023 at 20:16
  • \$\begingroup\$ Fair. In my defense, the code I pasted, even though it's "better"... it still does not satisfy the original premise: how to efficiently merge a list of deeply nested maps. \$\endgroup\$
    – Cristian
    Commented Jun 8, 2023 at 20:21
  • \$\begingroup\$ When asking a question about a Kotlin implementation of a solution to the same problem, provide more context. What are those 5000 configurations per second used for? There may be an approach less resource hungry than creating a merged Map. Are you aware of the concept of a Map with backup-Map delegating lookups to another Map for unmapped keys, possibly caching results? \$\endgroup\$
    – greybeard
    Commented Jun 9, 2023 at 7:50

Not the answer you're looking for? Browse other questions tagged or ask your own question.