1

I have a link which indludes 12 files and I want to download all of them using just one command in Wget. And I use Cygwin as a terminal to runWget.

The link is https://hydro1.gesdisc.eosdis.nasa.gov/data/NLDAS/NLDAS_FORA0125_M.002/1985/ and I only want all .grb files under this link. I have tired the following code, but it just downloads all .xml files.

I find some advices from https://disc.sci.gsfc.nasa.gov/recipes/?q=recipes/How-to-Download-Data-Files-from-HTTP-Service-with-wget ,but I still cannot solve the problem. Thanks for any help.

wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --keep-session-cookies -r -c -nH -nd -np -A nc4,xml "https://hydro1.gesdisc.eosdis.nasa.gov/data/NLDAS/NLDAS_FORA0125_M.002/1985/"

Thanks to all the help, I have fixed it but then it comes to a more complicated question. It requires username and password, which should be the case, but I do not know how to Create a .netrc file and Create a cookie file as recommended by https://disc.sci.gsfc.nasa.gov/recipes/?q=recipes/How-to-Download-Data-Files-from-HTTP-Service-with-wget.

Here is a biref description of what I need to do.

To run wget, you need to set up .netrc and create a cookie file:
Create a .netrc file in your home directory.
a. cd ~ or cd $HOME
b. touch .netrc
c. echo "machine urs.earthdata.nasa.gov login <uid> password <password>" >> .netrc
     where <uid> is your user name and <password> is your URS password
d. chmod 0600 .netrc (so only you can access it)

Create a cookie file. This file will be used to persist sessions across calls to Wget or Curl. For example:
a. cd ~ or cd $HOME
b. touch .urs_cookies

I wonder how I can do this in Wget from Cygwin for Windows.

2 Answers 2

1

Only looking at your example, it shows -A nc4,xml which would explain only downloading .xml files, there must not be any nc4 files at that link.

Anyway, here's what man wget says about -A:

Recursive Accept/Reject Options
   -A acclist --accept acclist
   -R rejlist --reject rejlist
       Specify comma-separated lists of file name suffixes or patterns
       to accept or reject. Note that if any of the wildcard
       characters, *, ?, [ or ], appear in an element of acclist or
       rejlist, it will be treated as a pattern, rather than a suffix.
       In this case, you have to enclose the pattern into quotes to
       prevent your shell from expanding it, like in -A "*.mp3" or -A
       '*.mp3'.

So for only grb files, try using -A grb as in:

wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --keep-session-cookies -r -c -nH -nd -np -A nc4,xml "https://hydro1.gesdisc.eosdis.nasa.gov/data/NLDAS/NLDAS_FORA0125_M.002/1985/"

After Edits:

Username & Password should be fairly easy, try adding these:

   --user=user
   --password=password
       Specify the username user and password password for both FTP
       and HTTP file retrieval.  These parameters can be overridden
       using the --ftp-user and --ftp-password options for FTP
       connections and the --http-user and --http-password options for
       HTTP connections.

again from man wget. And read about the --save-cookies file and ``--read-cookies file` options in there, it may be tricky to get cookies saved from a web browser, if they don't work in wget.

5
  • Thank you for your help. It now requires username and password which should be the case. However, I am new to wget and I do no know how to Create a .netrc file and Create a cookie file as recommended by the first link I post in the question. Could you also take a look? I have edited the question. Thanks a lot.
    – Yang Yang
    Commented Mar 13, 2017 at 22:01
  • Thanks a lot. In the turorial under the link, it says "touch .netrc". What does this mean?
    – Yang Yang
    Commented Mar 13, 2017 at 22:17
  • touch is a program, it changes the timestamp of a file (and creates a file if it doesn't exist). The tutorial link seems to have comments mixed in with commands, so it's not super clear what they want typed & what they're just talking about, but always look for & read a man page for new commands - sometimes there are info pages too that may be different. Be careful copying & pasting in commands you're unsure of, some can do strange or unwanted things (but NASA should be a safe enough site to not worry much)
    – Xen2050
    Commented Mar 13, 2017 at 22:22
  • Thanks a lot. I do not know why, but follow your advice, it finally works! One more question, you talk about reading a man page, what is the man page? Is it a tutorial or something else?
    – Yang Yang
    Commented Mar 13, 2017 at 22:26
  • man is "an interface to the on-line reference manuals", almost every terminal command has a man page that says what the command does & how to use it. I think Cygwin should have the man pages, and here's their FAQ for Where's the documentation?
    – Xen2050
    Commented Mar 14, 2017 at 18:10
1

Easy. You're missing an option

wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --keep-session-cookies -r -c -nH -nd -np -R html,xml -A grb "https://hydro1.gesdisc.eosdis.nasa.gov/data/NLDAS/NLDAS_FORA0125_M.002/1985/"

-R to reject all html and xml files, and -A to accept only grb files

1
  • Thanks for your help. I have edited the question. Could you also take a look?
    – Yang Yang
    Commented Mar 13, 2017 at 22:09

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .