3

Looking for some way to look at a csv file and delete columns including the header that have no values in the subsequent lines that follow.

Perhaps if I wanted to delete column Test03 from below including Test03 in the first line.

Test01,Test02,Test03,Test04  
11,22,,44  
11,22,,44  
11,22,,44  
11,22,,44  
11,22,,44  
11,22,,44  
3
  • The third column has a value in the first row but not the rest. Do you want to delete the third column in all but the first row?
    – John1024
    Commented May 7, 2014 at 6:18
  • I can help you with an awk command for this operation, but tell me this: Will the second line always be representative of the rest of the following lines?
    – bgStack15
    Commented May 7, 2014 at 15:43
  • Yes the second line and subsequent lines will have no values in column 3. Commented May 7, 2014 at 16:08

7 Answers 7

2

Here's an awk solution that performs agnostic to whichever columns are empty (ignoring the header).

awk -F, '{
    a[NR]=$0
}NR>1{
    for (i=1;i<=NF;i++) 
        if(length($i)!=0) b[i]++
}END{
    for (k=1;k<=NR;k++) { 
        LINE="" ; 
        split(a[k],c,",") ; 
        for (j=1;j<=NF;j++) 
            if(b[j]>0) 
                LINE=LINE","c[j] ; 
        print substr(LINE,2,length(LINE)-1)
    } 
}' test.csv
1
  • I'm not a big fan of the style of starting the next stanza on the same line as the } of the previous one, but, otherwise, this looks like a good answer.  Welcome to Super User, and good job!  I hope you continue to make contributions as good as this. Commented Dec 1, 2018 at 15:48
0

In current case you can just do:

sed 's/,,/,/g' test.csv > new.csv

This'll replace all double commas with just one, effectively removing your empty column. Note that you'll need to remove the column from the header yourself.

1
  • Yes that will remove the empty column from lines after the first but I want to remove the column entirely if there are no values but in the first automatically. Commented May 7, 2014 at 14:08
0

If you want to delete possibly non-empty columns (including in the header), use the 'cut' command:

cut -d , -f 1,2,4 test.csv > new.csv
2
  • This is fine for plain numeric data, but beware that CSV values can contain commas (when properly quoted) and this will not handle that - you would need a full CSV parser.
    – nobody
    Commented May 7, 2014 at 17:44
  • Yes aware of cut and it's great if you know which columns are empty but for this I don't always know. Andrew - which CSV parser would you refer to? Commented May 7, 2014 at 19:46
0

awk joins the party.

awk -F "," '{print $1","$2","$4}' test.csv > new.csv
0

This calls for a program rather than a quick command. The best way to do it would be, as suggested by Andrew Medico, to employ a proper CSV parser (in the case of perl you have Text::CSV).

However, I thought I'd write a perl script that works in very simple cases:

perl -F, -lane 'if($.==1){@a=@F;next};for($i=0;$i<@F;$i++){if($F[$i]!=""){push @c,$F[$i];push @b,$i}}if(@a){foreach(@b){push @t,$a[$_]};print join(",",@t);undef @a}print join(",",@c);undef @c' file.csv

This saves the first line and goes on to see if there are any empty fields in the next line. It then prints only the relevant headers, skipping the empty field in all lines.

Please note that it doesn't handle commas inside quoted strings. It does, however, turn:

Test01,Test02,Test03,Test04
11,22,,44
11,22,,44
11,22,,44
11,22,,44
11,22,,44
11,22,,44

into:

Test01,Test02,Test04
11,22,44
11,22,44
11,22,44
11,22,44
11,22,44
11,22,44
0

While trying different bash approaches I needed to remove all empty columns (including the header) reliably. To solve this I used Python with Pandas.

import pandas as pd

data = pd.read_csv('test.csv', sep='\t')
data.dropna(axis=1).to_csv('test_clean.csv')

The important thing here is to add the axis=1 to tell Pandas to apply the dropna to columns instead of rows.

0

For a typical user the easiest way would be to import data in Excel from this CSV file and export it once again after removing the column.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .