Merging CSV files : Appending instead of merging

Question

So basically i want to merge a couple of CSV files. Im using the following script to do that :

paste -d , *.csv > final.txt

However this has worked for me in the past but this time it doesn't work. It appends the data next to each other as opposed to below each other. For instance two files that contain records in the following format

CreatedAt   ID
Mon Jul 07 20:43:47 +0000 2014  4.86249E+17
Mon Jul 07 19:58:29 +0000 2014  4.86238E+17
Mon Jul 07 19:42:33 +0000 2014  4.86234E+17

When merged give

CreatedAt   ID CreatedAt    ID
Mon Jul 07 20:43:47 +0000 2014  4.86249E+17 Mon Jul 07 18:25:53 +0000 2014  4.86215E+17
Mon Jul 07 19:58:29 +0000 2014  4.86238E+17 Mon Jul 07 17:19:18 +0000 2014  4.86198E+17
Mon Jul 07 19:42:33 +0000 2014  4.86234E+17 Mon Jul 07 15:45:13 +0000 2014  4.86174E+17
                                            Mon Jul 07 15:34:13 +0000 2014  4.86176E+17

Would anyone know what the reason behind this is? Or what i can do to force merge below records?

it seems like one of your .csv file has more# of lines that other .csv file. Not sure from where you are getting the space. paste command uses "," to separate the entries. — AKS, Commented Jul 8, 2014 at 21:51
Do you mean that you did cat file*.csv > final.csv . That would give you records "below each other". Good luck. — shellter, Commented Jul 8, 2014 at 22:07
@ArunSangal : Yes but the count shouldnt matter for a join should it? Cyrus - Yes i mean join. The purpose of -d , was to separate it by comma. Also the Answer below worked. — user2233834, Commented Jul 9, 2014 at 9:38

Hastur · Accepted Answer · 2019-01-21 21:00:14Z

Assuming that all the csv files have the same format and all start with the same header, you can write a little script as the following to append all files in only one and to take only one time the header.

#!/bin/bash
OutFileName="X.csv"                       # Fix the output name
i=0                                       # Reset a counter
for filename in ./*.csv; do 
 if [ "$filename"  != "$OutFileName" ] ;      # Avoid recursion 
 then 
   if [[ $i -eq 0 ]] ; then 
      head -1  "$filename" >   "$OutFileName" # Copy header if it is the first file
   fi
   tail -n +2  "$filename" >>  "$OutFileName" # Append from the 2nd line each file
   i=$(( $i + 1 ))                            # Increase the counter
 fi
done

Notes:

The head -1 or head -n 1 command print the first line of a file (the head).
The tail -n +2 prints the tail of a file starting from the lines number 2 (+2)
Test [ ... ] is used to exclude the output file from the input list.
The output file is rewritten each time.
The command cat a.csv b.csv > X.csv can be simply used to append a.csv and b csv in a single file (but you copy 2 times the header).

The paste command pastes the files one on a side of the other. If a file has white spaces as lines you can obtain the output that you reported above.
The use of -d , asks to paste command to define fields separated by a comma ,, but this is not the case for the format of the files you reported above.

The cat command instead concatenates files and prints on the standard output, that means it writes one file after the other.

Refer to man head or man tail for the syntax of the single options (some version allows head -1 other instead head -n 1)...

I read now what he meant. Btw, you can put that increment to "i" variable within IF statement instead within the loop. — AKS, Commented Jul 9, 2014 at 19:57
@ArunSangal It's right. My error, I copied an old version. If the increment is outside the if block and the file of output is the first of the list, you will never have the header in the output file. — Hastur, Commented Jul 10, 2014 at 0:10
Noticed a small corner case issue: it breaks if filenames contain spaces. Can be fixed with adding some quotes: "$filename". — Jonik, Commented Jan 17, 2019 at 10:13
@Jonik Right and proper, thanks; fixed. It is devious to peek around a corner ... As you do, you risk to spot another one: better to put " even to $OutFileName ;-) — Hastur, Commented Jan 21, 2019 at 21:11

Alex · Accepted Answer · 2021-12-13 16:29:43Z

2

Alternative simple answer, this as combine_csv.sh:

#!/bin/bash
{ head -n 1 $1 && tail -q -n +2 $*; }

can be used like this:

pattern="my*filenames*.csv"
combine_csv.sh ${pattern} > result.csv

edited Dec 13, 2021 at 16:29

Alex

8,2537 gold badges53 silver badges80 bronze badges

answered Dec 13, 2020 at 17:04

Gerriet

1,34014 silver badges23 bronze badges

2

Nice, & should be && though
– user239558
Commented Sep 4, 2021 at 6:40

Add a comment |

Andrea · Accepted Answer · 2017-03-23 08:45:51Z

Thank you so much @wahwahwah. I used your script to make nautilus-action, but it work correctly only with this changes:

#!/bin/bash

for last; do true; done

OutFileName=$last/RESULT_`date +"%d-%m-%Y"`.csv                       # Fix the output name

i=0                                       # Reset a counter
for filename in "$last/"*".csv"; do

 if [ "$filename" != "$OutFileName" ] ;      # Avoid recursion 
 then 
   if [[ $i -eq 0 ]] ; then 
      head -1  "$filename" > "$OutFileName" # Copy header if it is the first file
   fi
   tail -n +2  "$filename" >> "$OutFileName" # Append from the 2nd line each file
   i=$(( $i + 1 ))                        # Increase the counter
 fi
done

Neil C. Obremski · Accepted Answer · 2023-03-20 20:58:17Z

1

Here's how I concatenate CSV files that have the same columns:

(head -qn 1 *.csv | head -n 1; tail -qn +2 *.csv) >combined.csv

Save time by calling head on any of the files specifically:

(head -n 1 first.csv; tail -n +2 *.csv) >combined.csv

No scripting or funky awk necessary!

answered Mar 20, 2023 at 20:58

Neil C. Obremski

19.9k25 gold badges90 silver badges125 bronze badges

Add a comment |

mmore500 · Accepted Answer · 2024-02-19 23:30:53Z

Give joinem a try, available via PyPi: python3 -m pip install joinem.

joinem provides a CLI for fast, flexbile concatenation of tabular data using polars. I/O is lazily streamed in order to give good performance when working with numerous, large files.

Example Usage

Pass input files via stdin and output file as an argument.

ls -1 path/to/*.csv | python3 -m joinem out.parquet

You can add the --progress flag to get a progress.

Further Information

joinem is also compatible with parquet, JSON, and feather file types. See the project's README for more usage examples and a full command-line interface API listing.

disclosure: I am the library author of joinem.

Collectives™ on Stack Overflow

Merging CSV files : Appending instead of merging

5 Answers 5

Example Usage

Further Information

Not the answer you're looking for? Browse other questions tagged
bash
shell
unix
csv
merge
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Example Usage

Further Information

Not the answer you're looking for? Browse other questions tagged bashshellunixcsvmerge or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
bash
shell
unix
csv
merge
or ask your own question.