3

I've got a speed question. I have a bash script which parses information from TheTvDb.com. It downloads nearly 40,000 lines of data, then reduces it down to about 5000 lines of data which gets written to the hard disk. Then it reads the file and parses it into several files which are used later as a lookup table. It's basically taking all the information it sees before each "/Episode" and writing it to a specific file, then resetting for the next one.

It has to synchronize on the "/Episode" tag because there is a "FirstAired" tag outside of the episode tags. This ensures that the data is drawn in sequence rather then depending on each individual tag to be relating to a episode.

here is the code in question.

  if [ -f "$mythicalLibrarian/$NewShowName/$NewShowName.xml" ]; then
   Ename=""
   actualEname=""
   FAired=""
   SeasonNr=""
   EpisodeNr=""
    recordNumber=0

    echo "Parsing Downloaded information: $NewShowName.xml "
    while read line
   do

     if [[ $line == \<\/Episode\> ]]; then
      (( ++recordNumber ))
      echo -ne "Building Record:$recordNumber ${actualEname:0:20}            \r" 1>&2 
     echo "$actualEname" >> "$mythicalLibrarian/$NewShowName/$NewShowName.actualEname.txt"&
      Ename=`echo "$actualEname" |sed 's/;.*//'`
     echo "$Ename" >> "$mythicalLibrarian/$NewShowName/$NewShowName.Ename.txt"&
     echo "$FAired" >> "$mythicalLibrarian/$NewShowName/$NewShowName.FAired.txt"&
     echo "$SeasonNr" >> "$mythicalLibrarian/$NewShowName/$NewShowName.S.txt"&
     echo "$EpisodeNr" >> "$mythicalLibrarian/$NewShowName/$NewShowName.E.txt"&
     Ename=""
     actualEname=""
     FAired=""
     SeasonNr=""
     EpisodeNr=""

#Get actual show name 
     elif [[ $line == \<EpisodeName\>* ]]; then
      actualEname=`echo "$line" | sed -e s/'<\/EpisodeName>'// -e s/'<EpisodeName>'// -e s/'\&amp\;'/'\&'/ -e s/'\&quot\;'/'\"'/ -e s/'\&amp\;'/'\&'/ -e s/'\&ndash\;'/'-'/ -e s/'\&lt\;'/'\<'/ -e 's/'\&gt\;'/'\>'/' |tr -d '|\?\*\<\"\:\>\+\\\[\]\/'`


#Get OriginalAirDate
    elif [[ $line == \<FirstAired\>* ]]; then
      FAired=`echo "$line" | sed -e s/'<FirstAired>'//g -e s/'<\/FirstAired>'//g`

#Get Season number
     elif [[ $line == \<SeasonNumber\>* ]]; then
      SeasonNr=`echo "$line" |sed -e s/'<SeasonNumber>'// -e s/'<\/SeasonNumber>'//`

#Get Episode number
    elif [[ $line == \<EpisodeNumber\>* ]]; then
      EpisodeNr=`echo "$line" |sed -e 's/<EpisodeNumber>//' -e 's/<\/EpisodeNumber>//'`

    fi
   done < "$mythicalLibrarian/$NewShowName/$NewShowName.xml"


   chmod 777 "$mythicalLibrarian"/"$NewShowName"/"$NewShowName".actualEname.txt
   chmod 666 "$mythicalLibrarian"/"$NewShowName"/"$NewShowName".Ename.txt
   chmod 666 "$mythicalLibrarian/$NewShowName/$NewShowName".FAired.txt
   chmod 666 "$mythicalLibrarian"/"$NewShowName"/"$NewShowName".S.txt
   chmod 666 "$mythicalLibrarian/$NewShowName/$NewShowName".E.txt
    GotNewInformation=1


  elif [ ! -f "$mythicalLibrarian/$NewShowName/$NewShowName.xml" ]; then
   echo "COULD NOT DOWNLOAD:www.thetvdb.com/api/$APIkey/series/$SeriesID/all/$Language.xml">>"$mythicalLibrarian"/output.log
  fi

Here is some of the data it is processing

<?xml version="1.0" encoding="UTF-8" ?>
<Data><Series>
  <Actors>|Fred Rogers|Adair Roth|Bert Lloyd|Bud Alder|Carol Saunders|Carole Switala|Deborah Neal Stampo|Don Brockett|Elsie Neal|Emilie Jacobson|Fred Michael|John Reardon|Jos|Judy Rubin|Keith David|Lenny Meledandri|Michael Horton|Robert Trow|Yoshi Ito|</Actors>
  <Airs_DayOfWeek></Airs_DayOfWeek>
  <Airs_Time></Airs_Time>
  <ContentRating></ContentRating>
  <FirstAired>1968-02-01</FirstAired>
  <Genre>|Children|</Genre>
  <Network>PBS</Network>
  <NetworkID></NetworkID>
  <Overview>&quot;In a little toy neighborhood, a tiny trolley rolls past a house at the end of a street.

  <Runtime>30</Runtime>
  <SeriesID>6843</SeriesID>
  <SeriesName>Mister Rogers' Neighborhood</SeriesName>
  <Status>Ended</Status>
  <added></added>
  <addedBy></addedBy>
  <banner>graphical/77750-g.jpg</banner>
  <fanart>fanart/original/77750-1.jpg</fanart>
  <poster></poster>
  <zap2it_id>SH002930</zap2it_id>
</Series>
<Episode>
  <EpisodeName>Change (1)</EpisodeName>
  <EpisodeNumber>1</EpisodeNumber>
  <FirstAired>1968-02-19</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Change (2)</EpisodeName>
  <EpisodeNumber>2</EpisodeNumber>
  <FirstAired>1968-02-20</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Change (3)</EpisodeName>
  <EpisodeNumber>3</EpisodeNumber>
  <FirstAired>1968-02-21</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Change (4)</EpisodeName>
  <EpisodeNumber>4</EpisodeNumber>
  <FirstAired>1968-02-22</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Change (5)</EpisodeName>
  <EpisodeNumber>5</EpisodeNumber>
  <FirstAired>1968-02-23</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 6</EpisodeName>
  <EpisodeNumber>6</EpisodeNumber>
  <FirstAired>1968-02-26</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 7</EpisodeName>
  <EpisodeNumber>7</EpisodeNumber>
  <FirstAired>1968-02-27</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 8</EpisodeName>
  <EpisodeNumber>8</EpisodeNumber>
  <FirstAired>1968-02-28</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 9</EpisodeName>
  <EpisodeNumber>9</EpisodeNumber>
  <FirstAired>1968-02-29</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 10</EpisodeName>
  <EpisodeNumber>10</EpisodeNumber>
  <FirstAired>1968-03-01</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 11</EpisodeName>
  <EpisodeNumber>11</EpisodeNumber>
  <FirstAired>1968-03-04</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 12</EpisodeName>
  <EpisodeNumber>12</EpisodeNumber>
  <FirstAired>1968-03-05</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 13</EpisodeName>
  <EpisodeNumber>13</EpisodeNumber>
  <FirstAired>1968-03-06</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 14</EpisodeName>
  <EpisodeNumber>14</EpisodeNumber>
  <FirstAired>1968-03-07</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 15</EpisodeName>
  <EpisodeNumber>15</EpisodeNumber>
  <FirstAired>1968-03-08</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Welcome Donkey Hodie (1)</EpisodeName>
  <EpisodeNumber>16</EpisodeNumber>
  <FirstAired>1968-03-11</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Welcome Donkey Hodie (2)</EpisodeName>
  <EpisodeNumber>17</EpisodeNumber>
  <FirstAired>1968-03-12</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Welcome Donkey Hodie (3)</EpisodeName>
  <EpisodeNumber>18</EpisodeNumber>
  <FirstAired>1968-03-13</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Welcome Donkey Hodie (4)</EpisodeName>
  <EpisodeNumber>19</EpisodeNumber>
  <FirstAired>1968-03-14</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Welcome Donkey Hodie (5)</EpisodeName>
  <EpisodeNumber>20</EpisodeNumber>
  <FirstAired>1968-03-15</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 21</EpisodeName>
  <EpisodeNumber>21</EpisodeNumber>
  <FirstAired>1968-03-18</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 22</EpisodeName>
  <EpisodeNumber>22</EpisodeNumber>
  <FirstAired>1968-03-19</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 23</EpisodeName>
  <EpisodeNumber>23</EpisodeNumber>
  <FirstAired>1968-03-20</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 24</EpisodeName>
  <EpisodeNumber>24</EpisodeNumber>
  <FirstAired>1968-03-21</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 25</EpisodeName>
  <EpisodeNumber>25</EpisodeNumber>
  <FirstAired>1968-03-22</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 26</EpisodeName>
  <EpisodeNumber>26</EpisodeNumber>
  <FirstAired>1968-03-25</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 27</EpisodeName>
  <EpisodeNumber>27</EpisodeNumber>
  <FirstAired>1968-03-26</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 28</EpisodeName>
  <EpisodeNumber>28</EpisodeNumber>
  <FirstAired>1968-03-27</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 29</EpisodeName>
  <EpisodeNumber>29</EpisodeNumber>
  <FirstAired>1968-03-28</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 30</EpisodeName>
  <EpisodeNumber>30</EpisodeNumber>
  <FirstAired>1968-03-29</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Red Monster (1)</EpisodeName>
  <EpisodeNumber>31</EpisodeNumber>
  <FirstAired>1968-04-01</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Red Monster (2)</EpisodeName>
  <EpisodeNumber>32</EpisodeNumber>
  <FirstAired>1968-04-02</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Red Monster (3)</EpisodeName>
  <EpisodeNumber>33</EpisodeNumber>
  <FirstAired>1968-04-03</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Red Monster (4)</EpisodeName>
  <EpisodeNumber>34</EpisodeNumber>
  <FirstAired>1968-04-04</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Red Monster (5)</EpisodeName>
  <EpisodeNumber>35</EpisodeNumber>
  <FirstAired>1968-04-05</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 36</EpisodeName>
  <EpisodeNumber>36</EpisodeNumber>
  <FirstAired>1968-04-08</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 37</EpisodeName>
  <EpisodeNumber>37</EpisodeNumber>
  <FirstAired>1968-04-09</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 38</EpisodeName>
  <EpisodeNumber>38</EpisodeNumber>
  <FirstAired>1968-04-10</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 39</EpisodeName>
  <EpisodeNumber>39</EpisodeNumber>
  <FirstAired>1968-04-11</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 40</EpisodeName>
  <EpisodeNumber>40</EpisodeNumber>
  <FirstAired>1968-04-12</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 41</EpisodeName>
  <EpisodeNumber>41</EpisodeNumber>
  <FirstAired>1968-04-15</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 42</EpisodeName>
  <EpisodeNumber>42</EpisodeNumber>
  <FirstAired>1968-04-16</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 43</EpisodeName>
  <EpisodeNumber>43</EpisodeNumber>
  <FirstAired>1968-04-17</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 44</EpisodeName>
  <EpisodeNumber>44</EpisodeNumber>
  <FirstAired>1968-04-18</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 45</EpisodeName>
  <EpisodeNumber>45</EpisodeNumber>
  <FirstAired>1968-04-19</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 46</EpisodeName>
  <EpisodeNumber>46</EpisodeNumber>
  <FirstAired>1968-04-22</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 47</EpisodeName>
  <EpisodeNumber>47</EpisodeNumber>
  <FirstAired>1968-04-23</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 48</EpisodeName>
  <EpisodeNumber>48</EpisodeNumber>
  <FirstAired>1968-04-24</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 49</EpisodeName>
  <EpisodeNumber>49</EpisodeNumber>
  <FirstAired>1968-04-25</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 50</EpisodeName>
  <EpisodeNumber>50</EpisodeNumber>
  <FirstAired>1968-04-26</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 51</EpisodeName>
  <EpisodeNumber>51</EpisodeNumber>
  <FirstAired>1968-04-29</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 52</EpisodeName>
  <EpisodeNumber>52</EpisodeNumber>
  <FirstAired>1968-04-30</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 53</EpisodeName>
  <EpisodeNumber>53</EpisodeNumber>
  <FirstAired>1968-05-01</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 54</EpisodeName>
  <EpisodeNumber>54</EpisodeNumber>
  <FirstAired>1968-05-02</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 55</EpisodeName>
  <EpisodeNumber>55</EpisodeNumber>
  <FirstAired>1968-05-03</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 56</EpisodeName>
  <EpisodeNumber>56</EpisodeNumber>
  <FirstAired>1968-05-06</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 57</EpisodeName>
  <EpisodeNumber>57</EpisodeNumber>
  <FirstAired>1968-05-07</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 58</EpisodeName>
  <EpisodeNumber>58</EpisodeNumber>
  <FirstAired>1968-05-08</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 59</EpisodeName>
  <EpisodeNumber>59</EpisodeNumber>
  <FirstAired>1968-05-09</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 60</EpisodeName>
  <EpisodeNumber>60</EpisodeNumber>
  <FirstAired>1968-05-10</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 61</EpisodeName>
  <EpisodeNumber>61</EpisodeNumber>
  <FirstAired>1968-05-13</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 62</EpisodeName>
  <EpisodeNumber>62</EpisodeNumber>
  <FirstAired>1968-05-14</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 63</EpisodeName>
  <EpisodeNumber>63</EpisodeNumber>
  <FirstAired>1968-05-15</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 64</EpisodeName>
  <EpisodeNumber>64</EpisodeNumber>
  <FirstAired>1968-05-16</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 65</EpisodeName>
  <EpisodeNumber>65</EpisodeNumber>
  <FirstAired>1968-05-17</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 66</EpisodeName>
  <EpisodeNumber>66</EpisodeNumber>
  <FirstAired>1968-05-20</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 67</EpisodeName>
  <EpisodeNumber>67</EpisodeNumber>
  <FirstAired>1968-05-21</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 68</EpisodeName>
  <EpisodeNumber>68</EpisodeNumber>
  <FirstAired>1968-05-22</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 69</EpisodeName>
  <EpisodeNumber>69</EpisodeNumber>
  <FirstAired>1968-05-23</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 70</EpisodeName>
  <EpisodeNumber>70</EpisodeNumber>
  <FirstAired>1968-05-24</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 71</EpisodeName>
  <EpisodeNumber>71</EpisodeNumber>
  <FirstAired>1968-05-27</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>
<Episode>
  <EpisodeName>Show 72</EpisodeName>
  <EpisodeNumber>72</EpisodeNumber>
  <FirstAired>1968-05-28</FirstAired>
  <SeasonNumber>1</SeasonNumber>
</Episode>

The problem is that on a i7 processor this takes 14.5 seconds. It is about 10x slower on my media center. I tried using a case statement which takes 15 seconds on the fast processor.

I would like to know about how to speed this process up. It seems that this is ridiculously slow for BASH which is supposed to be designed around data manipulation and file operations.

6 Answers 6

4

You will get a considerable speedup by dropping the & from the end of all those echo statements.

Test1:

$ time { for i in {1..1000}; do echo "hello"& done >/dev/null; } | cat

real    0m10.357s
user    0m2.764s
sys     0m15.441s

The cat eats the "done" messages when this is done at the command line. A colon could be used instead of cat to suppress the "done" messages from the first timed test. It's not the program that's doing it, it's the fact that the backgrounded processes are part of a pipe.

Test2:

$ time { for i in {1..1000}; do echo "hello"; done >/dev/null; }

real    0m0.152s
user    0m0.132s
sys     0m0.020s

Note that this was on a very slow, old machine.

You may also get a speed improvement by using Bash's regex and string processing features instead of repeatedly spawning multiple external utilities in a loop.

Example:

elif [[ $line == \<EpisodeName\>* ]]; then
    actualEname=${line//<\/EpisodeName>/}
    actualEname=${actualEname//<EpisodeName>/}
    actualEname=${actualEname//&amp;/&}
    actualEname=${actualEname//&ndash;/-}
    for string in '|' '&lt;' '&gt;' '&quot;' '?' '*' '<' '>' ':' '"' '+' '\' '[' ']' '/'
    do
        actualEname=${actualEname//$string}
    done

You had an extra &amp; in that line and a lot of unnecessary single quotes and escaping, by the way. Also, you're converting HTML entities and then deleting them. Why not just delete them to begin with? You also seem to be missing some g (global) modifiers.

Test3:

$ time { for i in {1..100}; do
    line='<EpisodeName>&lt;foo&amp;bar&ndash;baz&gt;Season&ndash;3&ndash;&quot;quux&quot;?*<>:"+\[]/</EpisodeName>'
    actualEname=$(echo "$line" | sed -e 's/<\/EpisodeName>//' -e 's/<EpisodeName>//' -e 's/&amp;/\&/g' -e 's/&quot;/"/g' -e 's/&ndash;/-/g' -e 's/&lt;/</g' -e 's/&gt;/>/g' |tr -d '|?*<":>+\\[]/')
done; }

real    0m7.779s
user    0m3.164s
sys     0m5.436s

Test4:

$ time { for i in {1..100}; do
    line='<EpisodeName>&lt;foo&amp;bar&ndash;baz&gt;Season&ndash;3&ndash;&quot;quux&quot;?*<>:"+\[]/</EpisodeName>
    actualEname=${line//<\/EpisodeName>/}
    actualEname=${actualEname//<EpisodeName>/}
    actualEname=${actualEname//&amp;/&}
    actualEname=${actualEname//&ndash;/-}
    for string in '|' '&lt;' '&gt;' '&quot;' '\?' '\*' '<' '>' ':' '"' '+' '\\' '[' ']' '\/'
    do
        actualEname=${actualEname//$string}
    done
done; }

real    0m5.403s
user    0m2.492s
sys     0m2.960s
6
  • Those '&lt;' '&gt;' '&quot;' need to be changed into '<' '>' and '"'. Any way to do that quicker? Commented Nov 26, 2010 at 19:23
  • Really, the important thing is the & char. Commented Nov 26, 2010 at 19:36
  • 1
    @Adam: In your original, you're changing them then deleting them. Why not just delete them to begin with? If you still want to do both, you can add more statements of the form actualEname=${actualEname//&lt;/<}. I've shown a method that's quicker. You could see if pre-processing your entire file outside the loop instead of line-by-line inside the loop is faster. while read -r type line; if [[ $type == "..." ]]; echo "$line" >> "$outputfile"; ... done < <(sed ... "$filename" | tr ...) where one of the things that sed would do is change <EpisodeName> into "EpisodeName" as the first... Commented Nov 26, 2010 at 19:52
  • 1
    ... word on the line which read would parse into the $type variable. But I really think you should probably use a proper XML tool. Commented Nov 26, 2010 at 19:53
  • That is really cool. I didn't know bash had inbuilt sed capabilities. Thank you very much. Commented Nov 26, 2010 at 22:06
1

Use something like XMLStarlet which is designed to process XML.

0

The slowdown is most likely due to the very high number of process spawns that are happening in that script (sed, tr).

You could achieve a much faster result by calling a program with an XML parser to read it in, and output to the various files. If you need to keep it in bash, maybe find something that can do XSLT to transform from the XML to the format used in the files and divide it up.

Personally I would do that sort of thing in Perl.

5
  • Yes. OP's code is already quite similar to what Perl would look like, so a rewrite shouldn't be difficult.
    – liori
    Commented Nov 26, 2010 at 18:18
  • Aye, Perl was originally a hybrid of C, sed, awk and sh anyhow.
    – Orbling
    Commented Nov 26, 2010 at 18:26
  • Who negative voted this, when what I said was the main component of the accepted answer "use less external programs"?!
    – Orbling
    Commented Nov 26, 2010 at 22:12
  • @liori This is a 134Kb bash script code.google.com/p/mythicallibrarian/source/browse/trunk/… I'm not rewriting this single function when this is what bash was designed for in the first place... Textual data manipulation and file operations is what this program does. There's about a million ways to speed up a bash script and I don't know them so that's why I'm here. Commented Nov 27, 2010 at 13:44
  • @Adam Aye, that's why reduction of externals was suggested, which I take it is what you did. The Perl comment was an edit, and just a comment given the short passage of code you put up.
    – Orbling
    Commented Nov 27, 2010 at 14:57
0

BASH which is supposed to be designed around data manipulation and file operations.

Bash is designed for interactive command processing and linking programs together via pipes. Heavy data processing is not the design space of any *sh that I know of.

Python or Perl would be a much better choice for the problem space.

0

I just tried this:

        echo "Parsing Downloaded information: $NewShowName.xml "
        while read line
        do




            if [[ $line == \<\/Episode\> ]]; then
                (( ++recordNumber ))
                echo -ne "Building Record:$recordNumber ${actualEname:0:20}            \r" 1>&2 
                echo "$EpisodeName" >> "$mythicalLibrarian/$NewShowName/$NewShowName.actualEname.txt"&
                Ename=`echo "$actualEname" |sed 's/;.*//'`
                echo "$EpisodeName" >> "$mythicalLibrarian/$NewShowName/$NewShowName.Ename.txt"&
                echo "$FirstAired" >> "$mythicalLibrarian/$NewShowName/$NewShowName.FAired.txt"&
                echo "$SeasonNumber" >> "$mythicalLibrarian/$NewShowName/$NewShowName.S.txt"&
                echo "$EpisodeNumber" >> "$mythicalLibrarian/$NewShowName/$NewShowName.E.txt"&
                EpisodeName=""
                actualEname=""
                FirstAired=""
                SeasonNumber=""
                EpisodeNumber=""
            else 
                var=`echo $line |tr '<>' ' '|awk '{print $1}'`

                value=`echo "$line"|sed -e s/'<'"$var"'>'// -e s/'<\/'"$var"'>'// -e s/'\&amp\;'/'\&'/ -e s/'\&quot\;'/'\"'/ -e s/'\&amp\;'/'\&'/ -e s/'\&ndash\;'/'-'/ -e s/'\&lt\;'/'\<'/ -e 's/'\&gt\;'/'\>'/' |tr -d '|\?\*\<\"\:\>\+\\\[\]\/'`
                eval $var="'$value'"
            fi

Which took 43 seconds on the faster processor

0

Holy cow Dennis Williamson, It parses in less then 1/2 second. It just flickers across the screen. It used to take 15 seconds, but now it's so quick that I can't even tell that it's happening.

These are the changes that Dennis Williamson suggested. I'm just posting it here.

            echo "Parsing Downloaded information: $NewShowName.xml "
            while read line
            do

                if [[ $line == \<\/Episode\> ]]; then
                    (( ++recordNumber ))
                    echo -ne "Building Record:$recordNumber ${actualEname:0:20}           \r" 1>&2 
                    echo "$actualEname" >> "$mythicalLibrarian/$NewShowName/$NewShowName.actualEname.txt"

                    echo "$Ename" >> "$mythicalLibrarian/$NewShowName/$NewShowName.Ename.txt"
                    echo "$FAired" >> "$mythicalLibrarian/$NewShowName/$NewShowName.FAired.txt"
                    echo "$SeasonNr" >> "$mythicalLibrarian/$NewShowName/$NewShowName.S.txt"
                    echo "$EpisodeNr" >> "$mythicalLibrarian/$NewShowName/$NewShowName.E.txt"
                    Ename=""
                    actualEname=""
                    FAired=""
                    SeasonNr=""
                    EpisodeNr=""

#Get actual show name   
                elif [[ $line == \<EpisodeName\>* ]]; then
                    line=${line/<\/EpisodeName>/}
                    line=${line/<EpisodeName>/}
                    line=${line/&lt;}
                    line=${line/&gt;/} 
                    line=${line/&quot;/} 
                    line=${line/&amp;/&}
                    line=${line/\|/}
                    line=${line/\?/}
                    line=${line/\*/}
                    line=${line/\:/}
                    line=${line/\+/}
                    line=${line/\\/}
                    line=${line/\//}
                    line=${line/\[/}
                    line=${line/\]/}
                    line=${line/\'/}
                    line=${line/\"/}
                    actualEname=${line/&ndash;/-}
                    Ename=${actualEname/;*/}

#Get OriginalAirDate
                elif [[ $line == \<FirstAired\>* ]]; then
                    line=${line/<\/FirstAired>/}
                    line=${line/<FirstAired>/}
                    FAired=$line

#Get Season number
                elif [[ $line == \<SeasonNumber\>* ]]; then
                    line=${line/<\/SeasonNumber>/}
                    line=${line/<SeasonNumber>/}
                    SeasonNr=$line

#Get Episode number
                elif [[ $line == \<EpisodeNumber\>* ]]; then
                    line=${line/<\/EpisodeNumber>/}
                    line=${line/<EpisodeNumber>/}
                    EpisodeNr=$line
                fi
            done < "$mythicalLibrarian/$NewShowName/$NewShowName.xml"


            chmod 666 "$mythicalLibrarian"/"$NewShowName"/"$NewShowName".actualEname.txt
            chmod 666 "$mythicalLibrarian"/"$NewShowName"/"$NewShowName".Ename.txt
            chmod 666 "$mythicalLibrarian/$NewShowName/$NewShowName".FAired.txt
            chmod 666 "$mythicalLibrarian"/"$NewShowName"/"$NewShowName".S.txt
            chmod 666 "$mythicalLibrarian/$NewShowName/$NewShowName".E.txt
            GotNewInformation=1
1
  • I'm glad it worked well for you. It's only necessary to escape #%*?\/ in the substitution operator and you don't need to "close" it if you're just doing a deletion, e.g. line=${line/|}. Commented Dec 6, 2010 at 15:14

Not the answer you're looking for? Browse other questions tagged or ask your own question.