Turning separate lines into a comma separated list with quoted entries

Question

I have the following data (a list of R packages parsed from a Rmarkdown file), that I want to turn into a list I can pass to R to install:

d3heatmap
data.table
ggplot2
htmltools
htmlwidgets
metricsgraphics
networkD3
plotly
reshape2
scales
stringr

I want to turn the list into a list of the form:

'd3heatmap', 'data.table', 'ggplot2', 'htmltools', 'htmlwidgets', 'metricsgraphics', 'networkD3', 'plotly', 'reshape2', 'scales', 'stringr'

I currently have a bash pipeline that goes from the raw file to the list above:

grep 'library(' Presentation.Rmd \
| grep -v '#' \
| cut -f2 -d\( \
| tr -d ')'  \
| sort | uniq

I want to add a step on to turn the new lines into the comma separated list. I've tried adding tr '\n' '","', which fails. I've also tried a number of the following Stack Overflow answers, which also fail:

https://stackoverflow.com/questions/1251999/how-can-i-replace-a-newline-n-using-sed

This produces library(stringr)))phics) as the result.

https://stackoverflow.com/questions/10748453/replace-comma-with-newline-in-sed

This produces ,% as the result.

Can sed replace new line characters?

This answer (with the -i flag removed), produces output identical to the input.

Do the delimiters need to be comma-space, or is comma alone acceptable? — steeldriver, Commented Jan 17, 2017 at 18:29
Either is fine, but I do need a quote character surrounding the string, either ' or ". — fbt, Commented Jan 17, 2017 at 18:31
Am I the first to notice that the input data and the script to process it, are completely incompatible. There will be no output. — ctrl-alt-delor, Commented Jan 19, 2017 at 21:43
The script I listed is how I generate the input data. Someone asked for it. The actual input data would look something like this. Note that Github changes the formatting to remove the new lines. — fbt, Commented Jan 20, 2017 at 2:56

zeppelin · Accepted Answer · 2017-01-17 21:04:47Z

32

You can add quotes with sed and then merge lines with paste, like that:

sed 's/^\|$/"/g'|paste -sd, -

If you are running a GNU coreutils based system (i.e. Linux), you can omit the trailing '-'.

If you input data has DOS-style line endings (as @phk suggested), you can modify the command as follows:

sed 's/\r//;s/^\|$/"/g'|paste -sd, -

edited Jan 17, 2017 at 21:04

answered Jan 17, 2017 at 18:56

zeppelin

3,84211 silver badges22 bronze badges

2

On MacOS (and maybe others), you will need to include a dash to indicate that the input is from stdin rather than a file: sed 's/^\|$/"/g'|paste -sd, -
– cherdt
Commented Jan 17, 2017 at 19:08
True, "coreutils" version of paste will accept both forms, but "-" is more POSIX. Thx !
– zeppelin
Commented Jan 17, 2017 at 19:21
2

Or just with sed alone: sed 's/.*/"&"/;:l;N;s/\n$.*$$/, "\1"/;tl'
– Digital Trauma
Commented Jan 17, 2017 at 20:09
1

@fbt The note I now added at the end of my answer applies here as well.
– phk
Commented Jan 17, 2017 at 20:52
1

@DigitalTrauma - not really a good idea; that would be very slow (might even hang with huge files) - see the answers to the Q I linked in my comment on the Q here; the cool thing is to use paste alone ;)
– don_crissti
Commented Jan 17, 2017 at 21:07

| Show 4 more comments

phk · Accepted Answer · 2017-01-18 18:11:03Z

Using awk:

awk 'BEGIN { ORS="" } { print p"'"'"'"$0"'"'"'"; p=", " } END { print "\n" }' /path/to/list

Alternative with less shell escaping and therefore more readable:

awk 'BEGIN { ORS="" } { print p"\047"$0"\047"; p=", " } END { print "\n" }' /path/to/list

Output:

'd3heatmap', 'data.table', 'ggplot2', 'htmltools', 'htmlwidgets', 'metricsgraphics', 'networkD3', 'plotly', 'reshape2', 'scales', 'stringr'

Explanation:

The awk script itself without all the escaping is BEGIN { ORS="" } { print p"'"$0"'"; p=", " } END { print "\n" }. After printing the first entry the variable p is set (before that it's like an empty string). With this variable p every entry (or in awk-speak: record) is prefixed and additionally printed with single quotes around it. The awk output record separator variable ORS is not needed (since the prefix is doing it for you) so it is set to be empty at the BEGINing. Oh and we might our file to END with a newline (e.g. so it works with further text-processing tools); should this not be needed the part with END and everything after it (inside the single quotes) can be removed.

Note

If you have Windows/DOS-style line endings (\r\n), you have to convert them to UNIX style (\n) first. To do this you can put tr -d '\015' at the beginning of your pipeline:

tr -d '\015' < /path/to/input.list | awk […] > /path/to/output

(Assuming you don't have any use for \rs in your file. Very safe assumption here.)

Alternatively, simply run dos2unix /path/to/input.list once to convert the file in-place.

When I run this command, I get ', 'stringr23aphics as the output. — fbt, Commented Jan 17, 2017 at 20:21
I know, right‽ :) I thought about mentioning that in many shells print p"'\''"$0"'\''"; would have also worked (it's not POSIXy though), or alternatively using bash's C quoting strings ($'') even just print p"\'"$0"\'"; (might have required doubling other backslashes though) but there's already the other method using awk's character escapes. — phk, Commented Jan 17, 2017 at 21:41

Community · Accepted Answer · 2017-04-13 12:36:51Z

As @don_crissti's linked answer shows, the paste option borders on incredibly fast -- the linux kernel's piping is more efficient than I would have believed if I hadn't just now tried it. Remarkably, if you can be happy with a single comma separating your list items rather than a comma+space, a paste pipeline

(paste -d\' /dev/null - /dev/null | paste -sd, -) <input

is faster than even a reasonable flex program(!)

%option 8bit main fast
%%
.*  { printf("'%s'",yytext); }
\n/(.|\n) { printf(", "); }

But if just decent performance is acceptable (and if you're not running a stress test, you won't be able to measure any constant-factor differences, they're all instant) and you want both flexibility with your separators and reasonable one-liner-y-ness,

sed "s/.*/'&'/;H;1h;"'$!d;x;s/\n/, /g'

is your ticket. Yes, it looks like line noise, but the H;1h;$!d;x idiom is the right way to slurp up everything, once you can recognize that the whole thing gets actually easy to read, it's s/.*/'&'/ followed by a slurp and a s/\n/, /g.

edit: bordering on the absurd, it's fairly easy to get flex to beat everything else hollow, just tell stdio you don't need the builtin multithread/signalhandler sync:

%option 8bit main fast
%%
.+  { putchar_unlocked('\'');
      fwrite_unlocked(yytext,yyleng,1,stdout);
      putchar_unlocked('\''); }
\n/(.|\n) { fwrite_unlocked(", ",2,1,stdout); }

and under stress that's 2-3x quicker than the paste pipelines, which are themselves at least 5x quicker than everything else.

(paste -d\ \'\' /dev/null /dev/null - /dev/null | paste -sd, -) <infile | cut -c2- would do comma+space @ pretty much the same speed though as you noted, it's not really flexible if you need some fancy string as separator — don_crissti, Commented Jan 18, 2017 at 11:33
That flex stuff is pretty damn cool man... this is the first time I see someone posting flex code on this site... big upvote ! Please post more of this stuff. — don_crissti, Commented Jan 24, 2017 at 21:39
@don_crissti Thanks! I'll look for good opportunities, sed/awk/whatnot are usually better options just for the convenience value but there's often a pretty easy flex answer too. — jthill, Commented Jan 25, 2017 at 22:19

phk · Accepted Answer · 2017-01-20 06:31:38Z

4

I think the following should do just fine, assuming you're data is in the file text

d3heatmap
data.table
ggplot2
htmltools
htmlwidgets
metricsgraphics
networkD3
plotly
reshape2
scales
stringr

Let's use arrays which have the substitution down cold:

#!/bin/bash
input=( $(cat text) ) 
output=( $(
for i in ${input[@]}
        do
        echo -ne "'$i',"
done
) )
output=${output:0:-1}
echo ${output//,/, }

The output of the script should be as follows:

'd3heatmap', 'data.table', 'ggplot2', 'htmltools', 'htmlwidgets', 'metricsgraphics', 'networkD3', 'plotly', 'reshape2', 'scales', 'stringr'

I believe this was what you were looking for?

edited Jan 20, 2017 at 6:31

phk

5,9837 gold badges43 silver badges72 bronze badges

answered Jan 19, 2017 at 21:41

Charles van der Genugten

1011 bronze badge

2

Nice solution. But while OP didn't explicitly ask for bash and while it is safe to assume that someone might use it (after all AFAIK it's the most used shell) it still shouldn't be taken for granted. Also, there are parts you could so a better job at quoting (putting in double quotes). For example, while the package names are unlikely to have spaces in them it still is good convention to quote variables rather than not, you might want to run shellcheck.net over it and see the notes and explanations there.
– phk
Commented Jan 20, 2017 at 6:36

Add a comment |

Sergiy Kolodyazhnyy · Accepted Answer · 2020-09-15 07:41:11Z

Python

Python one-liner:

$ python -c "import sys; print(','.join([repr(l.strip()) for l in sys.stdin]))" < input.txt                               
'd3heatmap','data.table','ggplot2','htmltools','htmlwidgets','metricsgraphics','networkD3','plotly','reshape2','scales','stringr'

Works in simple way - we redirect input.txt into stdin using shell's < operator, read each line into a list with .strip() removing newlines and repr() creating a quoted representation of each line. The list is then joined into one big string via .join() function, with , as separator

Alternatively we could use + to concatenate quotes to each stripped line.

 python -c "import sys;sq='\'';print(','.join([sq+l.strip()+sq for l in sys.stdin]))" < input.txt

Perl

Essentially same idea as before: read all lines,strip trailing newline, enclose in single quotes,stuff everything into array @cvs , and print out array values joined with commas.

$ perl -ne 'chomp; $sq = "\047" ; push @cvs,"$sq$_$sq";END{ print join(",",@cvs)   }'  input.txt                        
 'd3heatmap','data.table','ggplot2','htmltools','htmlwidgets','metricsgraphics','networkD3','plotly','reshape2','scales','stringr'

IIRC, pythons's join should be able to take an iterator therefore there should be no need to materialize the stdin loop to a list — iruvar, Commented Jan 20, 2017 at 6:37
@iruvar Yes, except look at OP's desired output - they want each word quoted, and we need to remove trailing newlines to ensure output is one line. You have an idea how to do that without a list comprehension ? — Sergiy Kolodyazhnyy, Commented Jan 20, 2017 at 6:44

Rolf · Accepted Answer · 2017-01-24 09:01:55Z

I often have a very similar scenario: I copy a column from Excel and want to convert the content into a comma separated list (for later usage in a SQL query like ... WHERE col_name IN <comma-separated-list-here>).

This is what I have in my .bashrc:

function lbl {
    TMPFILE=$(mktemp)
    cat $1 > $TMPFILE
    dos2unix $TMPFILE
    (echo "("; cat $TMPFILE; echo ")") | tr '\n' ',' | sed -e 's/(,/(/' -e 's/,)/)/' -e 's/),/)/'
    rm $TMPFILE
}

I then run lbl ("line by line") on the cmd line which waits for input, paste the content from the clipboard, press <C-D> and the function returns the input surrounded with (). This looks like so:

$ lbl
1
2
3
dos2unix: converting file /tmp/tmp.OGM6UahLTE to Unix format ...
(1,2,3)

(I don't remember why I put the dos2unix in here, presumably because this often causes trouble in my company's setup.)

PaulC · Accepted Answer · 2017-01-18 06:40:49Z

Some versions of sed act a little different, but on my mac, I can handle everything but the "uniq" in sed:

sed -n -e '
# Skip commented library lines
/#/b
# Handle library lines
/library(/{
    # Replace line with just quoted filename and comma
    # Extra quoting is due to command-line use of a quote
    s/library(\([^)]*\))/'\''\1'\'', /
    # Exchange with hold, append new entry, remove the new-line
    x; G; s/\n//
    ${
        # If last line, remove trailing comma, print, quit
        s/, $//; p; b
    }
    # Save into hold
    x
}
${
    # Last line not library
    # Exchange with hold, remove trailing comma, print
    x; s/, $//; p
}
'

Unfortunately to fix the unique part you have to do something like:

grep library Presentation.md | sort -u | sed -n -e '...'

--Paul

Welcome to Unix.stackexchange! I recommend you take the tour. — Stephen Rauch, Commented Jan 18, 2017 at 6:54

Fran · Accepted Answer · 2018-10-10 01:16:31Z

It is funny that to use a plain text list of R packages to install them in R, nobody proposed a solution using that list directly in R but fight with bash, perl, python, awk, sed or whatever to put quotes and commas in the list. This is not necessary at all and moreover does not solve how input and use the transformed list in R.

You can simply load the plain text file (said, packages.txt) as a dataframe with a single variable, that you can extract as a vector, directly usable by install.packages. So, convert it in a usable R object and install that list is just:

df <- read.delim("packages.txt", header=F, strip.white=T, stringsAsFactors=F)
install.packages(df$V1)

Or without an external file:

packages <-" 
d3heatmap
data.table
ggplot2
htmltools
htmlwidgets
metricsgraphics
networkD3
plotly
reshape2
scales
stringr
"
df <- read.delim(textConnection(packages), 
header=F, strip.white=T, stringsAsFactors=F)
install.packages(df$V1)

Stack Exchange Network

Turning separate lines into a comma separated list with quoted entries

8 Answers 8

Python

Perl

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
linux
sed
csv
tr
.

Linked

Hot Network Questions

Turning separate lines into a comma separated list with quoted entries

8 Answers 8

Python

Perl

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged linuxsedcsvtr.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
linux
sed
csv
tr
.