How to convert arbitrary simple JSON to CSV using jq?

Question

Using jq, how can arbitrary JSON encoding an array of shallow objects be converted to CSV?

There are plenty of Q&As on this site that cover specific data models which hard-code the fields, but answers to this question should work given any JSON, with the only restriction that it's an array of objects with scalar properties (no deep/complex/sub-objects, as flattening these is another question). The result should contain a header row giving the field names. Preference will be given to answers that preserve the field order of the first object, but it's not a requirement. Results may enclose all cells with double-quotes, or only enclose those that require quoting (e.g. 'a,b').

Examples

Input:

[
    {"code": "NSW", "name": "New South Wales", "level":"state", "country": "AU"},
    {"code": "AB", "name": "Alberta", "level":"province", "country": "CA"},
    {"code": "ABD", "name": "Aberdeenshire", "level":"council area", "country": "GB"},
    {"code": "AK", "name": "Alaska", "level":"state", "country": "US"}
]

Possible output:

code,name,level,country
NSW,New South Wales,state,AU
AB,Alberta,province,CA
ABD,Aberdeenshire,council area,GB
AK,Alaska,state,US

Possible output:

"code","name","level","country"
"NSW","New South Wales","state","AU"
"AB","Alberta","province","CA"
"ABD","Aberdeenshire","council area","GB"
"AK","Alaska","state","US"

Input:

[
    {"name": "bang", "value": "!", "level": 0},
    {"name": "letters", "value": "a,b,c", "level": 0},
    {"name": "letters", "value": "x,y,z", "level": 1},
    {"name": "bang", "value": "\"!\"", "level": 1}
]

Possible output:

name,value,level
bang,!,0
letters,"a,b,c",0
letters,"x,y,z",1
bang,"""!""",0

Possible output:

"name","value","level"
"bang","!","0"
"letters","a,b,c","0"
"letters","x,y,z","1"
"bang","""!""","1"

Three-plus years later ... a generic json2csv is at stackoverflow.com/questions/57242240/… — peak, Commented Jul 28, 2019 at 19:47
Even later to the party ;) Here is another generic solution that allows reverse transforms: stackoverflow.com/questions/69230818/… — opyh, Commented Sep 18, 2021 at 8:20

user3899165user3899165 · Accepted Answer · 2015-10-06 08:31:58Z

317

First, obtain an array containing all the different object property names in your object array input. Those will be the columns of your CSV:

(map(keys) | add | unique) as $cols

Then, for each object in the object array input, map the column names you obtained to the corresponding properties in the object. Those will be the rows of your CSV.

map(. as $row | $cols | map($row[.])) as $rows

Finally, put the column names before the rows, as a header for the CSV, and pass the resulting row stream to the @csv filter.

$cols, $rows[] | @csv

All together now. Remember to use the -r flag to get the result as a raw string:

jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | @csv'

answered Oct 6, 2015 at 8:31

user3899165

9

It's nice that your solution captures all of the property names from all rows, rather than just the first. I wonder what the performance implications of this are for very large documents, though. P.S. If you want, you can get rid of the $rows variable assignment by just inlining it: (map(keys) | add | unique) as $cols | $cols, map(. as $row | $cols | map($row[.]))[] | @csv
– Jordan Running
Commented Oct 6, 2015 at 15:59
16

Thanks, Jordan! I am aware that $rows does not have to be assigned to a variable; I just thought assigning it to a variable made the explanation nicer.
– user3899165
Commented Oct 6, 2015 at 19:38
3

consider converting the row value | string in case there's nested arrays or maps.
– TJR
Commented May 16, 2017 at 21:53
Good suggestion, @TJR. Maybe if there are nested structures, jq should recurse into them and make their values into columns as well
– Mr. Lance E Sloan
Commented Mar 6, 2018 at 17:39
1

Would appropriate use of the command then be: your last code block and then my.json > my.csv on the same line?
– user319487
Commented Nov 9, 2018 at 8:02

| Show 2 more comments

Community · Accepted Answer · 2021-10-07 07:34:52Z

The Skinny

jq -r '(.[0] | keys_unsorted) as $keys | $keys, map([.[ $keys[] ]])[] | @csv'

or:

jq -r '(.[0] | keys_unsorted) as $keys | ([$keys] + map([.[ $keys[] ]])) [] | @csv'

The Details

Aside

Describing the details is tricky because jq is stream-oriented, meaning it operates on a sequence of JSON data, rather than a single value. The input JSON stream gets converted to some internal type which is passed through the filters, then encoded in an output stream at program's end. The internal type isn't modeled by JSON, and doesn't exist as a named type. It's most easily demonstrated by examining the output of a bare index (.[]) or the comma operator (examining it directly could be done with a debugger, but that would be in terms of jq's internal data types, rather than the conceptual data types behind JSON).

$ jq -c '.[]' <<<'["a", "b"]'
"a"
"b"
$ jq -cn '"a", "b"'
"a"
"b"

Note that the output isn't an array (which would be ["a", "b"]). Compact output (the -c option) shows that each array element (or argument to the , filter) becomes a separate object in the output (each is on a separate line).

A stream is like a JSON-seq, but uses newlines rather than RS as an output separator when encoded. Consequently, this internal type is referred to by the generic term "sequence" in this answer, with "stream" being reserved for the encoded input and output.

Constructing the Filter

The first object's keys can be extracted with:

.[0] | keys_unsorted

Keys will generally be kept in their original order, but preserving the exact order isn't guaranteed. Consequently, they will need to be used to index the objects to get the values in the same order. This will also prevent values being in the wrong columns if some objects have a different key order.

To both output the keys as the first row and make them available for indexing, they're stored in a variable. The next stage of the pipeline then references this variable and uses the comma operator to prepend the header to the output stream.

(.[0] | keys_unsorted) as $keys | $keys, ...

The expression after the comma is a little involved. The index operator on an object can take a sequence of strings (e.g. "name", "value"), returning a sequence of property values for those strings. $keys is an array, not a sequence, so [] is applied to convert it to a sequence,

$keys[]

which can then be passed to .[]

.[ $keys[] ]

This, too, produces a sequence, so the array constructor is used to convert it to an array.

[.[ $keys[] ]]

This expression is to be applied to a single object. map() is used to apply it to all objects in the outer array:

map([.[ $keys[] ]])

Lastly for this stage, this is converted to a sequence so each item becomes a separate row in the output.

map([.[ $keys[] ]])[]

Why bundle the sequence into an array within the map only to unbundle it outside? map produces an array; .[ $keys[] ] produces a sequence. Applying map to the sequence from .[ $keys[] ] would produce an array of sequences of values, but since sequences aren't a JSON type, so you instead get a flattened array containing all the values.

["NSW","AU","state","New South Wales","AB","CA","province","Alberta","ABD","GB","council area","Aberdeenshire","AK","US","state","Alaska"]

The values from each object need to be kept separate, so that they become separate rows in the final output.

Finally, the sequence is passed through @csv formatter.

Alternate

The items can be separated late, rather than early. Instead of using the comma operator to get a sequence (passing a sequence as the right operand), the header sequence ($keys) can be wrapped in an array, and + used to append the array of values. This still needs to be converted to a sequence before being passed to @csv.

Can you use keys_unsorted instead of keys to preserve the key order from the first object? — Jordan Running, Commented Oct 6, 2015 at 15:53
@outis - The preamble about streams is somewhat inaccurate. The simple fact is that jq filters are stream-oriented. That is, any filter can accept a stream of JSON entities, and some filters can produce a stream of values. There is no "new line" or any other separator between the items in a stream -- it's only when they're printed that a separator is introduced. To see for yourself, try: jq -n -c 'reduce ("a","b") as $s (""; . + $s)' — peak, Commented Dec 15, 2015 at 6:21
Did something happen between when this was written and now to render it incorrect? The problem seems to be in the map, which, breaks even on a toy example: $ echo '{"a":1,"b":2,"c":3}' |jq -r '(. | keys_unsorted) as $keys| $keys, map( [.[ $keys[] ] ])[] | @csv' outputs "a","b","c" jq: error (at <stdin>:1): Cannot index number with string "a" on jq-1.5. — Wyatt, Commented Mar 20, 2017 at 20:58
@Wyatt: take a closer look at your data and the example input. The question is about an array of objects, not a single object. Try [{"a":1,"b":2,"c":3}]. — outis, Commented Mar 25, 2017 at 11:01
Working through the details of this solution taught me a LOT about jq! For anyone else struggling with the details, it may be helpful to play with "jq -cr '(.[0] | keys_unsorted) as $array_of_keys | $array_of_keys, (.[] | [ .[$array_of_keys[]] ]) | .'", since that's how the map filter is implemented. And remember that the "(foo) as $bar" variable assignment actually acts as a for-each that iterates over all the items in the (foo) expression (not an issue in this case, since we're pulling out the keys as a single item). — Roy Wood, Commented Jul 10, 2017 at 15:57

Logan Palanisamy · Accepted Answer · 2022-09-12 05:11:53Z

$cat test.json
[
    {"code": "NSW", "name": "New South Wales", "level":"state", "country": "AU"},
    {"code": "AB", "name": "Alberta", "level":"province", "country": "CA"},
    {"code": "ABD", "name": "Aberdeenshire", "level":"council area", "country": "GB"},
    {"code": "AK", "name": "Alaska", "level":"state", "country": "US"}
]


$ jq -r '["Code", "Name", "Level", "Country"], (.[] | [.code, .name, .level, .country]) | @tsv ' test.json
Code    Name    Level   Country
NSW New South Wales state   AU
AB  Alberta province    CA
ABD Aberdeenshire   council area    GB
AK  Alaska  state   US


$ jq -r '["Code", "Name", "Level", "Country"], (.[] | [.code, .name, .level, .country]) | @csv ' test.json
"Code","Name","Level","Country"
"NSW","New South Wales","state","AU"
"AB","Alberta","province","CA"
"ABD","Aberdeenshire","council area","GB"
"AK","Alaska","state","US"

Jonathan Allard · Accepted Answer · 2020-10-18 03:59:35Z

17

The following filter is slightly different in that it will ensure every value is converted to a string. (jq 1.5+)

# For an array of many objects
jq -f filter.jq [file]

# For many objects (not within array)
jq -s -f filter.jq [file]

Filter: filter.jq

def tocsv:
    (map(keys)
        |add
        |unique
        |sort
    ) as $cols
    |map(. as $row
        |$cols
        |map($row[.]|tostring)
    ) as $rows
    |$cols,$rows[]
    | @csv;

tocsv

edited Oct 18, 2020 at 3:59

Jonathan Allard

19k11 gold badges56 silver badges76 bronze badges

answered May 16, 2017 at 22:02

TJR

3,7438 gold badges38 silver badges42 bronze badges

1

This works good for simple JSON but what about JSON with nested properties that go down many levels?
– Amir
Commented Dec 28, 2017 at 21:14
This of course sorts the keys. Also the output of unique is sorted anyway, so unique|sort can be simplified to unique.
– peak
Commented Apr 24, 2018 at 22:32
5

@TJR When using this filter it is mandatory to switch on raw output using -r option. Otherwise all the quotes " become extra-escaped which is not valid CSV.
– Anthony
Commented May 13, 2019 at 15:01
1

Amir: nested properties don't map to CSV.
– chrishmorris
Commented Jun 13, 2019 at 11:02
1

@Amir: adding to chrishmorris' comment, this question is explicitly restricted to "array[s] of objects with scalar properties (no deep/complex/sub-objects, as flattening these is another question)".
– outis
Commented Jun 23, 2021 at 22:48

| Show 1 more comment

Jeff Mercado · Accepted Answer · 2015-10-06 17:48:17Z

I created a function that outputs an array of objects or arrays to csv with headers. The columns would be in the order of the headers.

def to_csv($headers):
    def _object_to_csv:
        ($headers | @csv),
        (.[] | [.[$headers[]]] | @csv);
    def _array_to_csv:
        ($headers | @csv),
        (.[][:$headers|length] | @csv);
    if .[0]|type == "object"
        then _object_to_csv
        else _array_to_csv
    end;

So you could use it like so:

to_csv([ "code", "name", "level", "country" ])

jtpereyda · Accepted Answer · 2023-01-06 01:49:08Z

6

If you're open to using other Unix tools, csvkit has an in2csv tool:

in2csv example.json

Using your sample data:

> in2csv example.json
code,name,level,country
NSW,New South Wales,state,AU
AB,Alberta,province,CA
ABD,Aberdeenshire,council area,GB
AK,Alaska,state,US

I like the pipe approach for piping directly from jq:

cat example.json | in2csv -f json -

answered Jan 6, 2023 at 1:49

jtpereyda

7,31510 gold badges55 silver badges84 bronze badges

Add a comment |

TWiStErRob · Accepted Answer · 2022-12-13 12:06:27Z

4

This variant of Santiago's program is also safe but ensures that the key names in the first object are used as the first column headers, in the same order as they appear in that object:

def tocsv:
  if length == 0 then empty
  else
    (.[0] | keys_unsorted) as $firstkeys
    | (map(keys) | add | unique) as $allkeys
    | ($firstkeys + ($allkeys - $firstkeys)) as $cols
    | ($cols, (.[] as $row | $cols | map($row[.])))
    | @csv
  end ;

tocsv

edited Dec 13, 2022 at 12:06

TWiStErRob

46.1k27 gold badges176 silver badges266 bronze badges

answered Dec 15, 2015 at 6:40

peak

113k20 gold badges171 silver badges200 bronze badges

Add a comment |

wjordan · Accepted Answer · 2023-07-18 21:24:15Z

3

Here's a compact solution:

(map(keys)|add|unique)as$k|$k,(.[]|[.[$k[]]])|@csv

(map(keys) | add | unique) as $k sets $k to all unique keys found in the array of objects
.[$k[]] returns the object index for all keys in $k
(.[] | [ ... ]) returns an array of values for each object.

answered Jul 18, 2023 at 21:24

wjordan

20.2k3 gold badges88 silver badges100 bronze badges

This is a nice solution for compactness and it also preserves the order of the values in the rows with respect to the header of keys.
– JHS
Commented Apr 22 at 21:03

Add a comment |

Nick Codignotto · Accepted Answer · 2022-08-22 17:43:12Z

2

A simple way is to just use string concatenation. If your input is a proper array:

# filename.txt
[
  {"field1":"value1", "field2":"value2"},
  {"field1":"value1", "field2":"value2"},
  {"field1":"value1", "field2":"value2"}
]

then index with .[]:

cat filename.txt | jq -r '.[] | .field1 + ", " + .field2'

or if it's just line by line objects:

# filename.txt
{"field1":"value1", "field2":"value2"}
{"field1":"value1", "field2":"value2"}
{"field1":"value1", "field2":"value2"}

just do this:

cat filename.txt | jq -r '.field1 + ", " + .field2'

answered Aug 22, 2022 at 17:43

Nick Codignotto

652 bronze badges

3

To highlight why this answer is getting downvotes: 1. it has missed the question's core, which is "arbitrary JSON" without hard-coding fields. 2. using string concatenation for conversion is bad in general, as it can result in bad data in the output, simply think about the output for {"field1":"value1,value3", "field2":"value2"}.
– TWiStErRob
Commented Dec 13, 2022 at 11:59
1

Helped me, though. Thanks Nick.
– Robert Muil
Commented Jul 6, 2023 at 15:02

Add a comment |

Collectives™ on Stack Overflow

How to convert arbitrary simple JSON to CSV using jq?

Examples

9 Answers 9

The Skinny

The Details

Aside

Constructing the Filter

Alternate

Not the answer you're looking for? Browse other questions tagged
json
csv
jq
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Examples

9 Answers 9

The Skinny

The Details

Aside

Constructing the Filter

Alternate

Not the answer you're looking for? Browse other questions tagged jsoncsvjq or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
json
csv
jq
or ask your own question.