Parse URL in shell script

Question

I have url like:

sftp://[email protected]/some/random/path

I want to extract user, host and path from this string. Any part can be random length.

Do you have to use a shell script? I'm presuming BASH. can you use python instead? — Flukey, Commented May 30, 2011 at 9:08
I'm trying to write custom nautilus shell script to open new ssh session in terminal from current sftp session in nautilus on Ubuntu. This url is $NAUTILUS_SCRIPT_CURRENT_URI global bariable. But actually, you are right, maybe I can use python or PHP. — umpirsky, Commented May 30, 2011 at 9:19
I agree with the comments above - using perl/python/php would ease things a lot. (Posting this after providing a bash-solution) — Heals, Commented May 30, 2011 at 9:26
Second part of the question stackoverflow.com/questions/6174906/… — umpirsky, Commented May 30, 2011 at 10:07

Heals · Accepted Answer · 2019-08-05 11:56:58Z

[EDIT 2019] This answer is not meant to be a catch-all, works for everything solution it was intended to provide a simple alternative to the python based version and it ended up having more features than the original.

It answered the basic question in a bash-only way and then was modified multiple times by myself to include a hand full of demands by commenters. I think at this point however adding even more complexity would make it unmaintainable. I know not all things are straight forward (checking for a valid port for example requires comparing hostport and host) but I would rather not add even more complexity.

[Original answer]

Assuming your URL is passed as first parameter to the script:

#!/bin/bash

# extract the protocol
proto="$(echo $1 | grep :// | sed -e's,^\(.*://\).*,\1,g')"
# remove the protocol
url="$(echo ${1/$proto/})"
# extract the user (if any)
user="$(echo $url | grep @ | cut -d@ -f1)"
# extract the host and port
hostport="$(echo ${url/$user@/} | cut -d/ -f1)"
# by request host without port    
host="$(echo $hostport | sed -e 's,:.*,,g')"
# by request - try to extract the port
port="$(echo $hostport | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"
# extract the path (if any)
path="$(echo $url | grep / | cut -d/ -f2-)"

echo "url: $url"
echo "  proto: $proto"
echo "  user: $user"
echo "  host: $host"
echo "  port: $port"
echo "  path: $path"

I must admit this is not the cleanest solution but it doesn't rely on another scripting language like perl or python. (Providing a solution using one of them would produce cleaner results ;) )

Using your example the results are:

url: [email protected]/some/random/path
  proto: sftp://
  user: user
  host: host.net
  port:
  path: some/random/path

This will also work for URLs without a protocol/username or path. In this case the respective variable will contain an empty string.

[EDIT]
If your bash version won't cope with the substitutions (${1/$proto/}) try this:

#!/bin/bash

# extract the protocol
proto="$(echo $1 | grep :// | sed -e's,^\(.*://\).*,\1,g')"

# remove the protocol -- updated
url=$(echo $1 | sed -e s,$proto,,g)

# extract the user (if any)
user="$(echo $url | grep @ | cut -d@ -f1)"

# extract the host and port -- updated
hostport=$(echo $url | sed -e s,$user@,,g | cut -d/ -f1)

# by request host without port
host="$(echo $hostport | sed -e 's,:.*,,g')"
# by request - try to extract the port
port="$(echo $hostport | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"

# extract the path (if any)
path="$(echo $url | grep / | cut -d/ -f2-)"

Wow, this looks great. But I got test: 6: Bad substitution test: 10: Bad substitution url: proto: sftp:// user: host: path: — umpirsky, Commented May 30, 2011 at 9:31
@Coyote updated to extract the port (which is slightly more complex..) — Heals, Commented Jul 5, 2016 at 6:23
@ñull what it should in your view and what the solution was designed to do are not the same. This has and will never provide the host without the port. — Heals, Commented May 28, 2019 at 10:53
@ñull it’s an interesting way to complain that you want a change. Now it has a hostport and host but originally I expected people to be able to come up with changes on their own after reading it. My bash sample already does more than the accepted python answer. — Heals, Commented Jun 14, 2019 at 10:22
Although it fastly become cumbersome, using shell parameter expansion might be simpler for some use cases. For example ${1##*//} to get the protocol or ${1%%\?*} to exclude query parameters. — Javier Palacios, Commented Apr 28, 2020 at 15:27

pjz · Accepted Answer · 2013-06-25 02:00:18Z

The above, refined (added password and port parsing), and working in /bin/sh:

# extract the protocol
proto="`echo $DATABASE_URL | grep '://' | sed -e's,^\(.*://\).*,\1,g'`"
# remove the protocol
url=`echo $DATABASE_URL | sed -e s,$proto,,g`

# extract the user and password (if any)
userpass="`echo $url | grep @ | cut -d@ -f1`"
pass=`echo $userpass | grep : | cut -d: -f2`
if [ -n "$pass" ]; then
    user=`echo $userpass | grep : | cut -d: -f1`
else
    user=$userpass
fi

# extract the host -- updated
hostport=`echo $url | sed -e s,$userpass@,,g | cut -d/ -f1`
port=`echo $hostport | grep : | cut -d: -f2`
if [ -n "$port" ]; then
    host=`echo $hostport | grep : | cut -d: -f1`
else
    host=$hostport
fi

# extract the path (if any)
path="`echo $url | grep / | cut -d/ -f2-`"

Posted b/c I needed it, so I wrote it (based on @Shirkin's answer, obviously), and I figured someone else might appreciate it.

+1, based on answer function I created a helper function which sets env vars: gist.github.com/maersu/2e050f6399e11348804bf162a301fb82 — maersu, Commented Apr 9, 2020 at 5:47

Community · Accepted Answer · 2021-10-07 07:59:29Z

17

This solution in principle works the same as Adam Ryczkowski's, in this thread - but has improved regular expression based on RFC3986, (with some changes) and fixes some errors (e.g. userinfo can contain '_' character). This can also understand relative URIs (e.g. to extract query or fragment).

# !/bin/bash

# Following regex is based on https://www.rfc-editor.org/rfc/rfc3986#appendix-B with
# additional sub-expressions to split authority into userinfo, host and port
#
readonly URI_REGEX='^(([^:/?#]+):)?(//((([^:/?#]+)@)?([^:/?#]+)(:([0-9]+))?))?(/([^?#]*))(\?([^#]*))?(#(.*))?'
#                    ↑↑            ↑  ↑↑↑            ↑         ↑ ↑            ↑ ↑        ↑  ↑        ↑ ↑
#                    |2 scheme     |  ||6 userinfo   7 host    | 9 port       | 11 rpath |  13 query | 15 fragment
#                    1 scheme:     |  |5 userinfo@             8 :…           10 path    12 ?…       14 #…
#                                  |  4 authority
#                                  3 //…

parse_scheme () {
    [[ "$@" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[2]}"
}

parse_authority () {
    [[ "$@" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[4]}"
}

parse_user () {
    [[ "$@" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[6]}"
}

parse_host () {
    [[ "$@" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[7]}"
}

parse_port () {
    [[ "$@" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[9]}"
}

parse_path () {
    [[ "$@" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[10]}"
}

parse_rpath () {
    [[ "$@" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[11]}"
}

parse_query () {
    [[ "$@" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[13]}"
}

parse_fragment () {
    [[ "$@" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[15]}"
}

edited Oct 7, 2021 at 7:59

CommunityBot

11 silver badge

answered Aug 31, 2017 at 8:51

Patryk Obara

1,84714 silver badges19 bronze badges

2

Isn't this running that regex for EACH part of the URL you're trying to parse? Adam's method may not have the perfect regex, but it only matches the pattern once.
– Auspex
Commented Sep 5, 2017 at 13:33
Yes, it does - of course, if you want to obtain more than one value from URI (as in the original question), then it's appropriate extract exact strings from BASH_REMATCH array (if you care more about speed than readability), exactly as @adam-ryczkowski did.
– Patryk Obara
Commented Sep 5, 2017 at 13:59
Thanks. Anyway, I used the regex for an application I have running in a docker container, where I didn't want to have to modify somebody else's docker image just to get sed...
– Auspex
Commented Sep 6, 2017 at 14:23
The syntax and semantics of URIs vary from scheme to scheme, as described by the defining specification for each scheme. Implementations may use scheme-specific rules, at further processing cost, to reduce the probability of false negatives. For example, because the "http" scheme makes use of an authority component, has a default port of "80", and defines an empty path to be equivalent to "/". So for me it looks like path in you REGEX shoult be optional (see rfc-editor.org/rfc/rfc3986#section-6.2.3)
– Максим Шатов
Commented Feb 16, 2022 at 10:54
Probably want "$*" in those functions, not "$@"
– glenn jackman
Commented Aug 1, 2023 at 18:00

Add a comment |

johnsyweb · Accepted Answer · 2011-05-30 09:42:03Z

8

Using Python (best tool for this job, IMHO):

#!/usr/bin/env python

import os
from urlparse import urlparse

uri = os.environ['NAUTILUS_SCRIPT_CURRENT_URI']
result = urlparse(uri)
user, host = result.netloc.split('@')
path = result.path
print('user=', user)
print('host=', host)
print('path=', path)

Further reading:

edited May 30, 2011 at 9:42

answered May 30, 2011 at 9:32

johnsyweb

140k26 gold badges194 silver badges250 bronze badges

Added os.system("gnome-terminal --execute /usr/bin/ssh " + user + "@" + host) at the bottom to do the work ;)
– umpirsky
Commented May 30, 2011 at 9:42
@umpiresky: Delighted to hear it. I've update my answer with some links in case you need to customise it.
– johnsyweb
Commented May 30, 2011 at 9:43
@umpiresky: That wasn't a part of the question! In that case, you can use [...]ssh result.netloc to save splitting the user from the host only to join them back together... (and do away with the print calls.)
– johnsyweb
Commented May 30, 2011 at 9:45
@Johnsyweb Sure, will do that. I'm just wondering how to set current directory when I ssh..
– umpirsky
Commented May 30, 2011 at 9:53
4

Very old but unacceptable answr. "in shell script" is clearly statef. A python solution is not an aswer, as far as neither is a java solution is neither
– Javier Palacios
Commented Apr 28, 2020 at 15:23

| Show 3 more comments

Abdullah Al Farooq · Accepted Answer · 2020-03-11 12:31:09Z

5

You can use bash string manipulation. It is easy to learn. In case you feel difficulties with regex, try it. As it is from NAUTILUS_SCRIPT_CURRENT_URI, i guess there may have port in that URI. So I also kept that optional.

#!/bin/bash

#You can also use environment variable $NAUTILUS_SCRIPT_CURRENT_URI
X="sftp://[email protected]/some/random/path"

tmp=${X#*//};usr=${tmp%@*}
tmp=${X#*@};host=${tmp%%/*};[[ ${X#*://} == *":"* ]] && host=${host%:*}
tmp=${X#*//};path=${tmp#*/}
proto=${X%:*}
[[ ${X#*://} == *":"* ]] && tmp=${X##*:} && port=${tmp%%/*}

echo "Potocol:"$proto" User:"$usr" Host:"$host" Port:"$port" Path:"$path

edited Mar 11, 2020 at 12:31

answered Mar 11, 2020 at 3:24

Abdullah Al Farooq

5196 silver badges17 bronze badges

1

The proto expression needs to do longest-match rather than shortest-match, so ${X%%:*} (double the percent sign). Otherwise, given valid (but admittedly weird) input like ssh://[email protected]:1234/some/path, the second colon will match instead of the first, and protocol will be reported as ssh://[email protected].
– Ti Strga
Commented Apr 4, 2023 at 21:35

Add a comment |

Community · Accepted Answer · 2021-10-07 08:46:00Z

I don't have enough reputation to comment, but I made a small modification to @patryk-obara's answer.

RFC3986 § 6.2.3. Scheme-Based Normalization treats

http://example.com
http://example.com/

as equivalent. But I found that his regex did not match a URL like http://example.com. http://example.com/ (with the trailing slash) does match.

I inserted 11, which changed / to (/|$). This matches either / or the end of the string. Now http://example.com does match.

readonly URI_REGEX='^(([^:/?#]+):)?(//((([^:/?#]+)@)?([^:/?#]+)(:([0-9]+))?))?((/|$)([^?#]*))(\?([^#]*))?(#(.*))?$'
#                    ↑↑            ↑  ↑↑↑            ↑         ↑ ↑            ↑↑    ↑        ↑  ↑        ↑ ↑
#                    ||            |  |||            |         | |            ||    |        |  |        | |
#                    |2 scheme     |  ||6 userinfo   7 host    | 9 port       ||    12 rpath |  14 query | 16 fragment
#                    1 scheme:     |  |5 userinfo@             8 :...         ||             13 ?...     15 #...
#                                  |  4 authority                             |11 / or end-of-string
#                                  3  //...                                   10 path

relistan · Accepted Answer · 2015-12-24 07:31:05Z

If you really want to do it in shell, you can do something as simple as the following by using awk. This requires knowing how many fields you will actually be passed (e.g. no password sometimes and not others).

#!/bin/bash

FIELDS=($(echo "sftp://[email protected]/some/random/path" \
  | awk '{split($0, arr, /[\/\@:]*/); for (x in arr) { print arr[x] }}'))
proto=${FIELDS[1]}
user=${FIELDS[2]}
host=${FIELDS[3]}
path=$(echo ${FIELDS[@]:3} | sed 's/ /\//g')

If you don't have awk and you do have grep, and you can require that each field have at least two characters and be reasonably predictable in format, then you can do:

#!/bin/bash

FIELDS=($(echo "sftp://[email protected]/some/random/path" \
   | grep -o "[a-z0-9.-][a-z0-9.-]*" | tr '\n' ' '))
proto=${FIELDS[1]}
user=${FIELDS[2]}
host=${FIELDS[3]}
path=$(echo ${FIELDS[@]:3} | sed 's/ /\//g')

Stam · Accepted Answer · 2016-06-10 14:47:40Z

Just needed to do the same, so was curious if it's possible to do it in single line, and this is what i've got:

#!/bin/bash

parse_url() {
  eval $(echo "$1" | sed -e "s#^\(\(.*\)://\)\?\(\([^:@]*\)\(:\(.*\)\)\?@\)\?\([^/?]*\)\(/\(.*\)\)\?#${PREFIX:-URL_}SCHEME='\2' ${PREFIX:-URL_}USER='\4' ${PREFIX:-URL_}PASSWORD='\6' ${PREFIX:-URL_}HOST='\7' ${PREFIX:-URL_}PATH='\9'#")
}

URL=${1:-"http://user:[email protected]/path/somewhere"}
PREFIX="URL_" parse_url "$URL"
echo "$URL_SCHEME://$URL_USER:$URL_PASSWORD@$URL_HOST/$URL_PATH"

How it works:

There is that crazy sed regex that captures all the parts of url, when all of them are optional (except for the host name)
Using those capture groups sed outputs env variables names with their values for relevant parts (like URL_SCHEME or URL_USER)
eval executes that output, causing those variables to be exported and available in the script
Optionally PREFIX could be passed to control output env variables names

PS: be careful when using this for arbitrary input since this code is vulnerable to script injections.

eval "$(sed -e "s#^((.*)://)\?(([^:@]*)(:(.*))\?@)\?([^/?]*)\?(:([0-9]*))(/(.*))\?#${PREFIX:-URL_}HOST='\7' ${PREFIX:-URL_}PORT='\9'#" <<< "$URL")" The problem is gnu.org/software/sed/manual/… only supports from 1 to 9. — FourDollars, Commented Sep 12, 2023 at 11:57

sschuberth · Accepted Answer · 2016-11-18 15:45:02Z

Here's my take, loosely based on some of the existing answers, but it can also cope with GitHub SSH clone URLs:

#!/bin/bash

PROJECT_URL="[email protected]:heremaps/here-aaa-java-sdk.git"

# Extract the protocol (includes trailing "://").
PARSED_PROTO="$(echo $PROJECT_URL | sed -nr 's,^(.*://).*,\1,p')"

# Remove the protocol from the URL.
PARSED_URL="$(echo ${PROJECT_URL/$PARSED_PROTO/})"

# Extract the user (includes trailing "@").
PARSED_USER="$(echo $PARSED_URL | sed -nr 's,^(.*@).*,\1,p')"

# Remove the user from the URL.
PARSED_URL="$(echo ${PARSED_URL/$PARSED_USER/})"

# Extract the port (includes leading ":").
PARSED_PORT="$(echo $PARSED_URL | sed -nr 's,.*(:[0-9]+).*,\1,p')"

# Remove the port from the URL.
PARSED_URL="$(echo ${PARSED_URL/$PARSED_PORT/})"

# Extract the path (includes leading "/" or ":").
PARSED_PATH="$(echo $PARSED_URL | sed -nr 's,[^/:]*([/:].*),\1,p')"

# Remove the path from the URL.
PARSED_HOST="$(echo ${PARSED_URL/$PARSED_PATH/})"

echo "proto: $PARSED_PROTO"
echo "user: $PARSED_USER"
echo "host: $PARSED_HOST"
echo "port: $PARSED_PORT"
echo "path: $PARSED_PATH"

which gives

proto:
user: git@
host: github.com
port:
path: :heremaps/here-aaa-java-sdk.git

And for PROJECT_URL="ssh://[email protected]:29418/jgit/jgit" you get

proto: ssh://
user: sschuberth@
host: git.eclipse.org
port: :29418
path: /jgit/jgit

Yan Foto · Accepted Answer · 2019-08-03 15:33:38Z

3

If you have access to Bash >= 3.0 you can do this in pure bash as well, thanks to the re-match operator =~:

pattern='^(([[:alnum:]]+)://)?(([[:alnum:]]+)@)?([^:^@]+)(:([[:digit:]]+))?$'
if [[ "http://[email protected]:3142" =~ $pattern ]]; then
        proto=${BASH_REMATCH[2]}
        user=${BASH_REMATCH[4]}
        host=${BASH_REMATCH[5]}
        port=${BASH_REMATCH[7]}
fi

It should be faster and less resource-hungry then all the previous examples, because no external process is be spawned.

edited Aug 3, 2019 at 15:33

Yan Foto

11.3k6 gold badges60 silver badges93 bronze badges

answered Aug 5, 2017 at 12:26

Adam Ryczkowski

7,87114 gold badges46 silver badges71 bronze badges

1

Unfortunately this would include path segment as part of the host name.
– Yan Foto
Commented Aug 15, 2019 at 12:50

Add a comment |

ccpizza · Accepted Answer · 2021-04-18 13:58:41Z

A simplistic approach to get just the domain from the full URL:

echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script | cut -d/ -f1-3

# OUTPUT>>> https://stackoverflow.com

Get only the path:

echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script | cut -d/ -f4-

# OUTPUT>>> questions/6174220/parse-url-in-shell-script

Not perfect, as the second command strips the preceding slash so you'll need to prepend it by hand.

An awk-based approach for getting just the path without the domain:

echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script/59971653 | awk -F"/" '{ for (i=4; i<=NF; i++) printf"/%s", $i }'

# OUTPUT>>> /questions/6174220/parse-url-in-shell-script/59971653

Lisias · Accepted Answer · 2016-12-15 10:13:17Z

I did further parsing, expanding the solution given by @Shirkrin:

#!/bin/bash

parse_url() {
    local query1 query2 path1 path2

    # extract the protocol
    proto="$(echo $1 | grep :// | sed -e's,^\(.*://\).*,\1,g')"

    if [[ ! -z $proto ]] ; then
            # remove the protocol
            url="$(echo ${1/$proto/})"

            # extract the user (if any)
            login="$(echo $url | grep @ | cut -d@ -f1)"

            # extract the host
            host="$(echo ${url/$login@/} | cut -d/ -f1)"

            # by request - try to extract the port
            port="$(echo $host | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"

            # extract the uri (if any)
            resource="/$(echo $url | grep / | cut -d/ -f2-)"
    else
            url=""
            login=""
            host=""
            port=""
            resource=$1
    fi

    # extract the path (if any)
    path1="$(echo $resource | grep ? | cut -d? -f1 )"
    path2="$(echo $resource | grep \# | cut -d# -f1 )"
    path=$path1
    if [[ -z $path ]] ; then path=$path2 ; fi
    if [[ -z $path ]] ; then path=$resource ; fi

    # extract the query (if any)
    query1="$(echo $resource | grep ? | cut -d? -f2-)"
    query2="$(echo $query1 | grep \# | cut -d\# -f1 )"
    query=$query2
    if [[ -z $query ]] ; then query=$query1 ; fi

    # extract the fragment (if any)
    fragment="$(echo $resource | grep \# | cut -d\# -f2 )"

    echo "url: $url"
    echo "   proto: $proto"
    echo "   login: $login"
    echo "    host: $host"
    echo "    port: $port"
    echo "resource: $resource"
    echo "    path: $path"
    echo "   query: $query"
    echo "fragment: $fragment"
    echo ""
}

parse_url "http://login:[email protected]:8080/one/more/dir/file.exe?a=sth&b=sth#anchor_fragment"
parse_url "https://example.com/one/more/dir/file.exe#anchor_fragment"
parse_url "http://login:[email protected]:8080/one/more/dir/file.exe#anchor_fragment"
parse_url "ftp://[email protected]:8080/one/more/dir/file.exe?a=sth&b=sth"
parse_url "/one/more/dir/file.exe"
parse_url "file.exe"
parse_url "file.exe#anchor"

user3132194 · Accepted Answer · 2019-04-11 07:22:11Z

I did not like above methods and wrote my own. It is for ftp link, just replace ftp with http if your need it. First line is a small validation of link, link should look like ftp://user:[email protected]/path/to/something.

if ! echo "$url" | grep -q '^[[:blank:]]*ftp://[[:alnum:]]\+:[[:alnum:]]\+@[[:alnum:]\.]\+/.*[[:blank:]]*$'; then return 1; fi

login=$(  echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\1|' )
pass=$(   echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\2|' )
host=$(   echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\3|' )
dir=$(    echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\4|' )

My actual goal was to check ftp access by url. Here is the full result:

#!/bin/bash

test_ftp_url()  # lftp may hang on some ftp problems, like no connection
    {
    local url="$1"

    if ! echo "$url" | grep -q '^[[:blank:]]*ftp://[[:alnum:]]\+:[[:alnum:]]\+@[[:alnum:]\.]\+/.*[[:blank:]]*$'; then return 1; fi

    local login=$(  echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\1|' )
    local pass=$(   echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\2|' )
    local host=$(   echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\3|' )
    local dir=$(    echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\4|' )

    exec 3>&2 2>/dev/null
    exec 6<>"/dev/tcp/$host/21" || { exec 2>&3 3>&-; echo 'Bash network support is disabled. Skipping ftp check.'; return 0; }

    read <&6
    if ! echo "${REPLY//$'\r'}" | grep -q '^220'; then exec 2>&3  3>&- 6>&-; return 3; fi   # 220 vsFTPd 3.0.2+ (ext.1) ready...

    echo -e "USER $login\r" >&6; read <&6
    if ! echo "${REPLY//$'\r'}" | grep -q '^331'; then exec 2>&3  3>&- 6>&-; return 4; fi   # 331 Please specify the password.

    echo -e "PASS $pass\r" >&6; read <&6
    if ! echo "${REPLY//$'\r'}" | grep -q '^230'; then exec 2>&3  3>&- 6>&-; return 5; fi   # 230 Login successful.

    echo -e "CWD $dir\r" >&6; read <&6
    if ! echo "${REPLY//$'\r'}" | grep -q '^250'; then exec 2>&3  3>&- 6>&-; return 6; fi   # 250 Directory successfully changed.

    echo -e "QUIT\r" >&6

    exec 2>&3  3>&- 6>&-
    return 0
    }

test_ftp_url 'ftp://fz223free:[email protected]/out/nsi/nsiProtocol/daily'
echo "$?"

huynhbaoan · Accepted Answer · 2021-07-26 05:36:21Z

I found Adam Ryczkowski's answers helpful. The original solution did not handle /path in URL, so I enhanced it a little bit.

pattern='^(([[:alnum:]]+):\/\/)?(([[:alnum:]]+)@)?([^:^@\/]+)(:([[:digit:]]+))?(\/?[^:^@]?)$'
url="http://[email protected]:3142/path"
if [[ "$url" =~ $pattern ]]; then
    proto=${BASH_REMATCH[2]}
    user=${BASH_REMATCH[4]}
    host=${BASH_REMATCH[5]}
    port=${BASH_REMATCH[7]}
    path=${BASH_REMATCH[8]}
    echo "proto: $proto"
    echo "user: $user"
    echo "host: $host"
    echo "port: $port"
    echo "path= $path"
else
    echo "URL did not match pattern: $url"
fi

The pattern is complex, so please use this site to understand it better: https://regex101.com/

I tested it with a bunch of URLs. However, if there are any issues, please let me know.

skagedal · Accepted Answer · 2024-01-13 18:24:57Z

The accepted answer reinterpreted the question XY Problem style, which turned out to be the right thing for the OP. Good for them. However, I suspect most people finding this question are looking for approaches to parsing an URL within an actual shell script (unspecified which shell, but I'll be assuming something like bash).

Most other answers focus on doing this parsing entirely using built-in shell mechanisms, or POSIX standard tools such as sed. In many situations I think the best approach would be to depend on an external tool which handles the nitty-gritty of URL parsing, while integrating well with a shell script workflow.

The choice of tool would probably depend on use case and intended audience. Here are some alternatives. Expecting the URL to live in $URL here.

Using Perl

Perl is a nice option as it is pretty ubiquitous.

perl -mURI -E 'say URI->new(@ARGV[0])->path()' -- "$URL"

Change path() to username(), host() or other methods available here.
Use print instead of say to omit the final newline.

Using Python

Python is now one of the most widely used programming languages, so that's a benefit. It's not quite as convenient to integrate with the shell for one-liners, as for example Perl is, though.

python3 -c 'import sys, urllib.parse; print(urllib.parse.urlparse(sys.argv[1]).path)' "$URL"

See here for other methods available on the parsed URL
Use sys.stdout.write instead of print to omit the final newline

Using Node.js

node -e "console.log(new URL(process.argv[1]).pathname)" "$URL"

See here for other methods available on the parsed URL
Use process.stdout.write instead of console.log to omit the final newline

Using trurl

trurl, sharing code with the curl URL fetching utility, is a nice option if requiring a custom tool is acceptable.

trurl "$URL" -g '[host]'

Brady Holt · Accepted Answer · 2019-12-29 18:35:09Z

0

If you have access to Node.js:

export MY_URI=sftp://[email protected]/some/random/path
node -e "console.log(url.parse(process.env.MY_URI).user)"
node -e "console.log(url.parse(process.env.MY_URI).host)"
node -e "console.log(url.parse(process.env.MY_URI).path)"

This will output:

user
host.net
/some/random/path

answered Dec 29, 2019 at 18:35

Brady Holt

2,9041 gold badge29 silver badges34 bronze badges

Add a comment |

briceburg · Accepted Answer · 2021-03-30 05:02:41Z

Here's a pure bash url parser. It supports git ssh clone style URLs as well as standard proto:// ones. The example ignores protocol, auths, and port but you can modify to collect as needed... I used regex101 for handy testing: https://regex101.com/r/5QyNI5/1

TEST_URLS=(
  https://github.com/briceburg/tools.git
  https://foo:[email protected]:8080/briceburg/tools.git
  [email protected]:briceburg/tools.git
  https://[email protected]:[email protected]:443/p/a/t/h
)

for url in "${TEST_URLS[@]}"; do
  without_proto="${url#*:\/\/}"
  without_auth="${without_proto##*@}"
  [[ $without_auth =~ ^([^:\/]+)(:[[:digit:]]+\/|:|\/)?(.*) ]]
  PROJECT_HOST="${BASH_REMATCH[1]}"
  PROJECT_PATH="${BASH_REMATCH[3]}"

  echo "given: $url"
  echo "  -> host: $PROJECT_HOST path: $PROJECT_PATH"
done

results in:

given: https://github.com/briceburg/tools.git
  -> host: github.com path: briceburg/tools.git
given: https://foo:[email protected]:8080/briceburg/tools.git
  -> host: github.com path: briceburg/tools.git
given: [email protected]:briceburg/tools.git
  -> host: github.com path: briceburg/tools.git
given: https://[email protected]:[email protected]:443/p/a/t/h
  -> host: my.site.com path: p/a/t/h

Collectives™ on Stack Overflow

Parse URL in shell script

17 Answers 17

Using Perl

Using Python

Using Node.js

Using trurl

Not the answer you're looking for? Browse other questions tagged
parsing
shell
url
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

17 Answers 17

Using Perl

Using Python

Using Node.js

Using trurl

Not the answer you're looking for? Browse other questions tagged parsingshellurl or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
parsing
shell
url
or ask your own question.