0

i am asking for your help please.

I reviewed a lot of sources and did some testing with awk and sed, but i can't get it to work. Below is a snippet of a config file which i could get an output via grep, but not the way i need it.

> file.txt
> 
> "<property>name="DBName"><value>ABC</value>name="DBName"><value>DEF</value></property>
> 
> cat file.xml | grep -o -P '.name="DBName"><value>.{0,20}'
> name="DBName"><value>ABC</value>
> name="DBName"><value>DEF</value></propert

The desired output is:

ABC
DEF

Thanks for any help.

Andi

4
  • 2
    Welcome, please edit the question, specifically the code part. Remove the trailing > and add the relevant snippets of the file file.xml apart of the grep command. The best approach for XML files generally is to use tools designed for parsing that format like xmllint (approach valid for other structured formats like JSON). Commented Dec 21, 2023 at 14:27
  • If your data is actual XML, looks like < are missing before both name=. Can you, please, clarify ? Apart from this, you don't need to cat someFile | grep [options], you can directly grep [options] someFile.
    – Httqm
    Commented Dec 21, 2023 at 14:50
  • 1
    You show the content of files.txt but then your code cats file.xml and your file.txt has lines in double quotes but missing some <s, etc. Please clean up your question to clearly show us the actual input you need help parsing. I suspect you have a file.xml which you're doing some pre-processing on to create file.txt and are now asking for help to do further processing on - don't do that as it's unlikely to be the best approach, just show the contents of file.xml and the desired final output.
    – Ed Morton
    Commented Dec 21, 2023 at 15:17
  • Try this: grep -oP '\>[A-Za-z0-9]+\<' infile | tr -d '\<\>'
    – user167612
    Commented Dec 21, 2023 at 20:01

5 Answers 5

2

If (huge, enormous "if") your file really only has the very simple case where you want the exact string <value> followed by a few non-< characters and then </value>, so your problem can be formuated as "fetch me the string of simple, non-newline characters found between each occurrence of <value> and the first < after it", then you can do (using GNU grep):

grep -oP '<value>\K[^<]+' file

Of course, this will fail on anything even slightly different. If you have multi-line values, for example, or if the value tag can have something like <value foo=bar> or on any number of other, perfectly valid XML cases. The Right Way© is to use an XM: parser instead. You might want to check out xmllint or XMLStarlet among others.

0
1

If all of your input looks exactly like the one line of sample input you posted:

$ cat file
"<property>name="DBName"><value>ABC</value>name="DBName"><value>DEF</value></property>

then using any awk:

$ awk -F'[<>]+' '{for (i=5; i<=NF; i+=4) print $i}' file
ABC
DEF

but, like any other solution that doesn't use an XML parser, that is fragile.

1

Using XMLStarlet:

XML file taken from the answer.

<config>
 <property name="DBName"><value>ABC</value></property>
 <property name="DBName" year="2023"><value>DEF</value></property>
 <property name="SystemName"><value>s70</value></property>
</config>
$ xmlstarlet select -t -v '//value' --nl ex.xml
ABC
DEF
s70

$ xmlstarlet select -t -m '//property[@name="DBName"]' -v 'value' --nl ex.xml
ABC
DEF

Using awk:

$ awk -v pat1="<value>" -v pat2="</value>" '
   {
       while (match($0, pat1)){ 
           $0=substr($0,RSTART+RLENGTH);
           if (match($0, pat2)) print substr($0,1,RSTART-1)
       }
   }
'

The awk and pcregrep solutions work for this question but may fail on many occasions.

$ pcregrep -o1 '<value>(.*?)</value>'
0

Thank a lot, this worked for me. Here to sum it up more sample code and the final solution for mr:

host123:~ # grep -oP '.{0,50}name="DBName"><value>.{0,100}' file.xml
ystemName"><value>XYZ</value></property><property name="DBName"><value>ABC-SERVER1</value></property><property name="SystemHome"><value>host123</value></property><prope 
ystemName"><value>XYZ</value></property><property name="DBName"><value>DEF</value></property><property name="SystemHome"><value>host123</value></property><property name=


host123:~ # grep -oP 'name="DBName"><value>\K[^<]+' file.xml 
ABC-SERVER1 
DEF
0

If you have a well formed XML input such us

<config>
 <property name="DBName"><value>ABC</value></property>
 <property name="DBName" year="2023"><value>DEF</value></property>
 <property name="SystemName"><value>s70</value></property>
</config>

You can use a XML-aware tool, and use Xpath (or similar) to select the parts to extract. Example with xidel:

## 1) get all values:
$ xidel -e "//value" ex.xml 
ABC
DEF
s70
## 2) get the values inside "property" with attribute "name" "DBname"
$ xidel -e "//property[@name='DBname']/value" ex.xml 
ABC
DEF

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .