227

I would like to merge arrays in YAML, and load them via ruby -

some_stuff: &some_stuff
 - a
 - b
 - c

combined_stuff:
  <<: *some_stuff
  - d
  - e
  - f

I'd like to have the combined array as [a,b,c,d,e,f]

I receive the error: did not find expected key while parsing a block mapping

How do I merge arrays in YAML?

2
  • 24
    @PatrickCollins I found this question trying to reduce duplication in my .gitlab-ci.yml file and unfortunately I have no control over the parser that GitLab CI uses :( Commented Feb 8, 2019 at 8:18
  • As a fallback, if the application code is also maintained by you, one can recursively merge them there, similar to how I did it here.
    – Asclepius
    Commented Jan 26, 2020 at 20:56

6 Answers 6

93

If the aim is to run a sequence of shell commands, you may be able to achieve this as follows:

# note: no dash before commands
some_stuff: &some_stuff |-
    a
    b
    c

combined_stuff:
  - *some_stuff
  - d
  - e
  - f

This is equivalent to:

some_stuff: "a\nb\nc"

combined_stuff:
  - "a\nb\nc"
  - d
  - e
  - f

I have been using this on my gitlab-ci.yml (to answer @rink.attendant.6 comment on the question).


Working example that we use to support requirements.txt having private repos from gitlab:

.pip_git: &pip_git
- git config --global url."https://gitlab-ci-token:${CI_JOB_TOKEN}@gitlab.com".insteadOf "ssh://[email protected]"
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- echo "$SSH_KNOWN_HOSTS" > ~/.ssh/known_hosts
- chmod 644 ~/.ssh/known_hosts

test:
    image: python:3.7.3
    stage: test
    script:
        - *pip_git
        - pip install -q -r requirements_test.txt
        - python -m unittest discover tests

use the same `*pip_git` on e.g. build image...

where requirements_test.txt contains e.g.

-e git+ssh://[email protected]/example/[email protected]#egg=example

12
  • 8
    Clever. I'm using it in our Bitbucket pipeline now. Thanks Commented Oct 9, 2019 at 14:03
  • 1
    *The trailing dash is not required here, only the pipe at the end is enough. *This is an inferior solution since when the job fails on a very long multi-line statement it's not clear which command failed.
    – Mina Luke
    Commented Oct 16, 2019 at 5:05
  • 3
    @MinaLuke, inferior in comparison to what? None of the current answers provide a way to merge two items using only yaml... Moreover, there is nothing in the question stating that the OP wishes to use this in CI/CD. Finally, when this is used in CI/CD, logging only depends on the particular CI/CD used, not on the yaml declaration. So, if anything, the CI/CD that you are referring to is the one doing a bad job. The yaml in this answer is valid, and solves OP's problem. Commented Oct 16, 2019 at 6:07
  • 1
    it doesn't work for me. with - i get an error, like it trying to insert the list inside a list item. i dont know how to use the pipe. what was it for? how @Dariop manage to use it in BB Pipelines? Commented May 25, 2020 at 20:50
  • 2
    I'm suspicious about this. Please correct me if I'm wrong, but if it gets transformed to "a\nb\nc", this would mean gitlab-runner won't get a chance to check exit codes of a and b. So if any command in the sequence except the last one has failed, this wouldn't interrupt pipeline and runner would continue executing the rest of the commands..
    – Hi-Angel
    Commented Oct 13, 2020 at 14:20
41

This is not going to work:

  1. merge is only supported by the YAML specifications for mappings and not for sequences

  2. you are completely mixing things by having a merge key << followed by the key/value separator : and a value that is a reference and then continue with a list at the same indentation level

This is not correct YAML:

combine_stuff:
  x: 1
  - a
  - b

So your example syntax would not even make sense as a YAML extension proposal.

If you want to do something like merging multiple arrays you might want to consider a syntax like:

combined_stuff:
  - <<: *s1, *s2
  - <<: *s3
  - d
  - e
  - f

where s1, s2, s3 are anchors on sequences (not shown) that you want to merge into a new sequence and then have the d, e and f appended to that. But YAML is resolving these kind of structures depth first, so there is no real context available during the processing of the merge key. There is no array/list available to you where you could attach the processed value (the anchored sequence) to.

You can take the approach as proposed by @dreftymac, but this has the huge disadvantage that you somehow need to know which nested sequences to flatten (i.e. by knowing the "path" from the root of the loaded data structure to the parent sequence), or that you recursively walk the loaded data structure searching for nested arrays/lists and indiscriminately flatten all of them.

A better solution IMO would be to use tags to load data structures that do the flattening for you. This allows for clearly denoting what needs to be flattened and what not and gives you full control over whether this flattening is done during loading, or done during access. Which one to choose is a matter of ease of implementation and efficiency in time and storage space. This is the same trade-off that needs to be made for implementing the merge key feature and there is no single solution that is always the best.

E.g. my ruamel.yaml library uses the brute force merge-dicts during loading when using its safe-loader, which results in merged dictionaries that are normal Python dicts. This merging has to be done up-front, and duplicates data (space inefficient) but is fast in value lookup. When using the round-trip-loader, you want to be able to dump the merges unmerged, so they need to be kept separate. The dict like datastructure loaded as a result of round-trip-loading, is space efficient but slower in access, as it needs to try and lookup a key not found in the dict itself in the merges (and this is not cached, so it needs to be done every time). Of course such considerations are not very important for relatively small configuration files.


The following implements a merge like scheme for lists in python using objects with tag flatten which on-the-fly recurses into items which are lists and tagged toflatten. Using these two tags you can have YAML file:

l1: &x1 !toflatten
  - 1 
  - 2
l2: &x2
  - 3 
  - 4
m1: !flatten
  - *x1
  - *x2
  - [5, 6]
  - !toflatten [7, 8]

(the use of flow vs block style sequences is completely arbitrary and has no influence on the loaded result).

When iterating over the items that are the value for key m1 this "recurses" into the sequences tagged with toflatten, but displays other lists (aliased or not) as a single item.

One possible way with Python code to achieve that is:

import sys
from pathlib import Path
import ruamel.yaml

yaml = ruamel.yaml.YAML()


@yaml.register_class
class Flatten(list):
   yaml_tag = u'!flatten'
   def __init__(self, *args):
      self.items = args

   @classmethod
   def from_yaml(cls, constructor, node):
       x = cls(*constructor.construct_sequence(node, deep=True))
       return x

   def __iter__(self):
       for item in self.items:
           if isinstance(item, ToFlatten):
               for nested_item in item:
                   yield nested_item
           else:
               yield item


@yaml.register_class
class ToFlatten(list):
   yaml_tag = u'!toflatten'

   @classmethod
   def from_yaml(cls, constructor, node):
       x = cls(constructor.construct_sequence(node, deep=True))
       return x



data = yaml.load(Path('input.yaml'))
for item in data['m1']:
    print(item)

which outputs:

1
2
[3, 4]
[5, 6]
7
8

As you can see you can see, in the sequence that needs flattening, you can either use an alias to a tagged sequence or you can use a tagged sequence. YAML doesn't allow you to do:

- !flatten *x2

, i.e. tag an anchored sequence, as this would essentially make it into a different datastructure.

Using explicit tags is IMO better than having some magic going on as with YAML merge keys <<. If nothing else you now have to go through hoops if you happen to have a YAML file with a mapping that has a key << that you don't want to act like a merge key, e.g. when you make a mapping of C operators to their descriptions in English (or some other natural language).

1
  • What is the meaning of: "merge is only supported by the YAML specifications for mappings and not for sequences"? yaml.org/type/merge.html suggests the opposite: "If the value associated with the merge key is a sequence, then this sequence is expected to contain mapping nodes and each of these nodes is merged in turn according to its order in the sequence." Is the distinction that YAML specifies merge for sequences, but only if they contain mapping nodes, not arbitrary sequences? Commented Nov 23, 2022 at 4:17
34

Update: 2019-07-01 14:06:12

  • Note: another answer to this question was substantially edited with an update on alternative approaches.
    • That updated answer mentions an alternative to the workaround in this answer. It has been added to the See also section below.

Context

This post assumes the following context:

  • python 2.7
  • python YAML parser

Problem

lfender6445 wishes to merge two or more lists within a YAML file, and have those merged lists appear as one singular list when parsed.

Solution (Workaround)

This may be obtained simply by assigning YAML anchors to mappings, where the desired lists appear as child elements of the mappings. There are caveats to this, however, (see "Pitfalls" infra).

In the example below we have three mappings (list_one, list_two, list_three) and three anchors and aliases that refer to these mappings where appropriate.

When the YAML file is loaded in the program we get the list we want, but it may require a little modification after load (see pitfalls below).

Example

Original YAML file

  list_one: &id001
   - a
   - b
   - c

  list_two: &id002
   - e
   - f
   - g

  list_three: &id003
   - h
   - i
   - j

  list_combined:
      - *id001
      - *id002
      - *id003

Result after YAML.safe_load

## list_combined
  [
    [
      "a",
      "b",
      "c"
    ],
    [
      "e",
      "f",
      "g"
    ],
    [
      "h",
      "i",
      "j"
    ]
  ]

Pitfalls

  • this approach produces a nested list of lists, which may not be the exact desired output, but this can be post-processed using the flatten method
  • the usual caveats to YAML anchors and aliases apply for uniqueness and declaration order

Conclusion

This approach allows creation of merged lists by use of the alias and anchor feature of YAML.

Although the output result is a nested list of lists, this can be easily transformed using the flatten method.

See also

Updated alternative approach by @Anthon

Examples of the flatten method

31

If you only need to merge one item into a list you can do

fruit:
  - &banana
    name: banana
    colour: yellow

food:
  - *banana
  - name: carrot
    colour: orange

which yields

fruit:
  - name: banana
    colour: yellow

food:
  - name: banana
    colour: yellow
  - name: carrot
    colour: orange
3

Another way to enable merging arrays in python is by defining a !flatten tag. (This uses PyYAML, unlike Anthon's answer above. This may be necessary in cases when you don't have control over which package is used in the back end, e.g., anyconfig).

import yaml

yaml.add_constructor("!flatten", construct_flat_list)

def flatten_sequence(sequence: yaml.Node) -> Iterator[str]:
    """Flatten a nested sequence to a list of strings
        A nested structure is always a SequenceNode
    """
    if isinstance(sequence, yaml.ScalarNode):
        yield sequence.value
        return
    if not isinstance(sequence, yaml.SequenceNode):
        raise TypeError(f"'!flatten' can only flatten sequence nodes, not {sequence}")
    for el in sequence.value:
        if isinstance(el, yaml.SequenceNode):
            yield from flatten_sequence(el)
        elif isinstance(el, yaml.ScalarNode):
            yield el.value
        else:
            raise TypeError(f"'!flatten' can only take scalar nodes, not {el}")

def construct_flat_list(loader: yaml.Loader, node: yaml.Node) -> List[str]:
    """Make a flat list, should be used with '!flatten'

    Args:
        loader: Unused, but necessary to pass to `yaml.add_constructor`
        node: The passed node to flatten
    """
    return list(flatten_sequence(node))

This recursive flattening takes advantage of the PyYAML document structure, which parses all arrays as SequenceNodes, and all values as ScalarNodes. The behavior can be tested (and modified) in the following test function.

import pytest
def test_flatten_yaml():
    # single nest
    param_string = """
    bread: &bread
      - toast
      - loafs
    chicken: &chicken
      - *bread
    midnight_meal: !flatten
      - *chicken
      - *bread
    """
    params = yaml.load(param_string)
    assert sorted(params["midnight_meal"]) == sorted(
        ["toast", "loafs", "toast", "loafs"]
    )
1

You can merge mappings then convert their keys into a list, under these conditions:

  • if you are using jinja2 templating and
  • if item order is not important
some_stuff: &some_stuff
 a:
 b:
 c:

combined_stuff:
  <<: *some_stuff
  d:
  e:
  f:

{{ combined_stuff | list }}
4
  • What's wrong with this answer? I don't mind downvotes if they are argumented. I'll keep the answer for people who can make use of it.
    – sm4rk0
    Commented Nov 14, 2019 at 12:55
  • 6
    Likely because this answer relies on jinja2 templating, when the question asks to do it in yml. jinja2 requires a Python environment, which is counter-productive if the OP is trying to DRY. Also, many CI/CD tools do not accept a templating step. Commented Nov 22, 2019 at 13:54
  • 1
    Thanks @JorgeLeitao. That makes sense. I learned YAML and Jinja2 together while developing Ansible playbooks and templates and can't think about one without another
    – sm4rk0
    Commented Nov 22, 2019 at 21:18
  • 1
    Well, I went there for ansible yaml as well; but what you did @sm4rk0 is not combining list, but hash (dictionnaries). And yes, it works, it's even documented here but unfortunatly, adding item to an existing list in an inventory file seems to not be possible...
    – 4wk_
    Commented Aug 4, 2022 at 14:46

Not the answer you're looking for? Browse other questions tagged or ask your own question.