9

I want to perform the title-named action under linux command-line(several ca bash script will also do). the command I tried is:

sed 's/href="([^"])"/$1/g' page.html > list.lst

but obviously it failed.

To be precise, here is my input:

<link rel="stylesheet" type="text/css" href="style/css/colors.css" />
<link rel="stylesheet" type="text/css" href="style/css/global.css" />
<link rel="stylesheet" type="text/css" href="style/css/icons.css" />

the output I want would be a comma-separated or space-separated list of all matches in the input file:

style/css/colors.css,style/css/global.css,style/css/icons.css

I think I got the right expression: href="([^"]*)"

but I have no clue how to perform this. sed would do a search/replace which is not exactly what I want.( to the contrary, I only need to keep matches and throw the rest away, and not to replace them )

1

1 Answer 1

8
grep href page.html | sed 's/^.*href="\([^"]*\)".*$/\1/' | xargs | sed 's/ /,/g'

This will extract all the lines that contain href in them and will only get the first href on each line. Also, refer to this post about parsing HTML with regular expressions.

3
  • This just work great, thanks! As for the warning about parsing-HTML-with-regular-expressions, the files in input won't hold anymore things that these link elements, so it'll be ok I guess. I'll just put a warning about probable devilish corruption during use of the script.
    – BiAiB
    Commented Jul 26, 2011 at 15:09
  • @BiAiB, there are numerous things that can go wrong with parsing HTML with regex, such as using ' instead of " for attributes (or not using quotes at all), using spaces between href and =, putting href on a new line, and many others. So if you're not absolutely sure that the HTML will look exactly like that, it's probably a bad idea.
    – rid
    Commented Jul 26, 2011 at 15:12
  • or simply a commented link node. Btw I'm not sure single quotes are valid in XHTML. For now i'll use that cause it's simple. When the time'll come, it will be easy to replace.
    – BiAiB
    Commented Jul 26, 2011 at 15:33

Not the answer you're looking for? Browse other questions tagged or ask your own question.