1

I am currently working with a vendor-provided software that is trying to handle sending attachment files to another script that will text-extract from the listed file. The script fails when we receive files from an outside source that contain spaces, as the vendor-supplied software does not surround the filename in quotes - meaning when the text-extraction script is run, it receives a filename that will split apart on the space and cause an error on the extractor script. The vendor-provided software is not editable by us.

This whole process is designed to be an automated transfer, so having this wrench that could be randomly thrown into the gears is an issue.

What we're trying to do, is handle the spaced name in our text extractor script, since that is the piece we have some control over. After a quick Google, it seems like changing the IFS value for the script would be the quick solution, but unfortunately, that script would take effect after the extensions have already mutilated the incoming data.

The script I'm using takes in a -e value, a -i value, and a -o value. These values are sent from the vendor supplied script, which I have no editing control over.

#!/bin/bash

usage() { echo "Usage: $0 -i input -o output -e encoding" 1>&2; exit 1; }

while getopts ":o:i:e:" o; do
    case "${o}" in
        i)
            inputfile=${OPTARG}
            ;;
        o)
            outputfile=${OPTARG}
            ;;
        e)
            encoding=${OPTARG}
            ;;
        *)
            usage
            ;;
    esac
done
shift $((OPTIND-1))

...
...
<Uses the inputfile, outputfile, and encoding variables>

I admit, there may be pieces to this I don't fully understand, and it could be a simple fix, but my end goal is to be able to extract -o, -i, and -e that all contain 1 value, regardless of the spaces within each section. I can handle quoting the script after I can extract the filename value

3 Answers 3

3

The script fragment that you have posted does not have any issues with spaces in the arguments.

The following, for example, does not need quoting (since it's an assignment):

inputfile=${OPTARG}

All other uses of $inputfile in the script should be double quoted.

What matters is how this script is called.

This would fail and would assign only hello to the variable inputfile:

$ ./script.sh -i hello world.txt

The string world.txt would prompt the getopts function to stop processing the command line and the script would continue with the shift (world.txt would be left in $1 afterwards).

The following would correctly assign the string hello world.txt to inputfile:

$ ./script.sh -i "hello world.txt"

as would

$ ./script.sh -i hello\ world.txt
9
  • I understand that there are ways to catch this before executing the script, but unfortunately, this script is called from another program which I cannot modify, so I have to handle the splitting of the filename after it is fed into this script.
    – MeanJerry
    Commented Sep 8, 2017 at 16:43
  • 1
    @MeanJerry You never said anything about an actual problem. Do you in fact see a problem or not? From the question it seems as if you were just worried. Commented Sep 8, 2017 at 16:44
  • @MeanJerry, you can't handle the splitting "after it's fed in". Splitting is done by the program that invokes your script before the script is started, and before the shell that executes it is in memory. Commented Sep 8, 2017 at 16:46
  • @MeanJerry, ...look at the man page for the execve syscall. That's the OS-level interface used for one program to invoke another. You'll see that it requires an argument list to already be split into a separate C string per argument before the program being executed is invoked at all. Commented Sep 8, 2017 at 16:47
  • 1
    It also matters how you use the $inputfile variable in your script. You need to quote it everywhere: "$inputfile" -- see Security implications of forgetting to quote a variable in bash/POSIX shells Commented Sep 8, 2017 at 17:18
1

The following script uses awk to split the arguments while including spaces in the file names. The arguments can be in any order. It does not handle multiple consecutive spaces in an argument, it collapses them to one.

#!/bin/bash

IFS=' '
str=$(printf "%s" "$*")

istr=$(echo "${str}" | awk 'BEGIN {FS="-i"} {print $2}' | awk 'BEGIN {FS="-o"} {print $1}' | awk 'BEGIN {FS="-e"} {print $1}')
estr=$(echo "${str}" | awk 'BEGIN {FS="-e"} {print $2}' | awk 'BEGIN {FS="-o"} {print $1}' | awk 'BEGIN {FS="-i"} {print $1}')
ostr=$(echo "${str}" | awk 'BEGIN {FS="-o"} {print $2}' | awk 'BEGIN {FS="-e"} {print $1}' | awk 'BEGIN {FS="-i"} {print $1}')

inputfile=""${istr}""
outputfile=""${ostr}""
encoding=""${estr}""

# call the jar

There was an issue when calling the jar where Java threw a MalformedUrlException on a filename with a space.

1

So after reading through the commentary, we decided that although it may not be the right answer for every scenario, the right answer for this specific scenario was to extract the pieces manually.

Because we are building this for a pre-built script passing to it, and we aren't updating that script any time soon, we can accept with certainty that this script will always receive a -i, -o, and -e flag, and there will be spaces between them, which causes all the pieces passed in to be stored in different variables in $*.

And we can assume that the text after a flag is the response to the flag, until another flag is referenced. This leaves us 3 scenarios:

  1. The variable contains one of the flags
  2. The variable contains the first piece of a parameter immediately after the flag
  3. The variable contains part 2+ of a parameter, and the space in the name was interpreted as a split, and needs to be reinserted.

One of the other issues I kept running into was trying to get string literals to equate to variables in my IF statements. To resolve that issue, I pre-stored all relevant data in array variables, so I could test $variable == $otherVariable.

Although I don't expect it to change, we also handled what to do if the three flags appear in a different order than we anticipate (Our assumption was that they list as i,o,e... but we can't see excatly what is passed). The parameters are dumped into an array in the order they were read in, and a parallel array tracks whether the items in slots 0,1,2 relate to i,o,e.

The final result still has one flaw: if there is more than one consecutive space in the filename, the whitespace is trimmed before processing, and I can only account for one space. But saying as we processed over 4000 files before encountering one with a space, I find it unlikely with the naming conventions that we would encounter something with more than one space.

At that point, we would have to be stepping in for a rare intervention anyways.

Final code change is as follows:

#!/bin/bash
IFS='|'

position=-1
ioeArray=("" "" "")
previous=""
flagArr=("-i" "-o" "-e" " ")
ioePattern=(0 1 2)


#echo "for loop:"
for i in $*; do
    #printf "%s\n" "$i"
    if [ "$i" == "${flagArr[0]}" ] || [ "$i" == "${flagArr[1]}" ] || [ "$i" == "${flagArr[2]}" ]; then
        ((position += 1));
        previous=$i;
        case "$i" in
            "${flagArr[0]}")
            ioePattern[$position]=0
            ;;
            "${flagArr[1]}")
            ioePattern[$position]=1
            ;;
            "${flagArr[2]}")
                    ioePattern[$position]=2
            ;;
        esac
        continue;
    fi
    if [[ $previous == "-"* ]]; then
        ioeArray[$position]=${ioeArray[$position]}$i;
    else
        ioeArray[$position]=${ioeArray[$position]}" "$i;
    fi
    previous=$i;

done


echo "extracting (${ioeArray[${ioePattern[0]}]}) to (${ioeArray[${ioePattern[1]}]}) with (${ioeArray[${ioePattern[2]}]}) encoding."

inputfile=""${ioeArray[${ioePattern[0]}]}"";
outputfile=""${ioeArray[${ioePattern[1]}]}"";
encoding=""${ioeArray[${ioePattern[2]}]}"";

Not the answer you're looking for? Browse other questions tagged or ask your own question.