0

I want to use curl to download the latest version of this file. The site has a robots.txt, however, which is what I think is stopping me from just using curl -L -z WorldGuard.zip http://www.curse.com/server-mods/minecraft/worldguard/download to get it. There is a direct link http://addons.curse.cursecdn.com/files/684/741/worldguard-5.7.3.zip to the file which is located in the html source code of the page and I can use this link to curl it since this link is not a permalink, I need to find a way to obtain this url from the first link (which is a permalink).

If I use curl -L http://www.curse.com/server-mods/minecraft/worldguard/downloadI end up with this as the output. I've tried using FOR /F "skip=628 tokens=10,11,12,13,14 delims=/ " %%a in ('curl -L http://www.curse.com/server-mods/minecraft/worldguard/download') DO curl -z foo.zip %%a') but I there appears to be a limit to how many lines I can skip (similar to the token limit of 31) and it would probably have given me all the lines after that as well (not what I want).

Next, I tried saving the output to a text file and deleting all lines except the one I want, however, I don't know how to delete lines that don't contain a specific string. I was thinking of only focusing on lines that had "http://addons.curse.cursecdn.com/files/" (in other words, the line that had the url I wanted), but I have no idea how to do that.

How can I obtain just the url (or the part that changes: 684/741/worldguard-5.7.3.zip) and, hence, get curl to download it?

Edit: I am open to alternatives if there is no easy way of doing it in a batch script and/or using curl. I am willing to accept answers that use visual basic (.vbs.), powershell or anything that can be executed from a batch file (which should be nearly everything). I'd still prefer using batch and curl to keep it consistent and in one file, and because I already have 90% of what I want in batch. Also, I am not that familiar with things that aren't batch so I'd prefer it if you explain what the script does.

2
  • Learn PowerShell. It comes by default with Win7, and has much better functionality (close to other .NET languages). Commented Apr 28, 2013 at 8:39
  • @grawity I'm willing to use other options (I've edited the question to reflect this). If you know how to do this, then feel free to post the script. As I've stated in the edit, I'd prefer it if you briefly explained what each part does.
    – Craft1n3ss
    Commented Apr 28, 2013 at 10:35

3 Answers 3

0

The following commands will look for the line containing the download link in the .htm file and uses a quick and dirty method of extracting the URL from that line. It's not very robust, but it should work as long as the html used for the line 'If your download doesn't begin click here' is not drastically changed.

for /F "tokens=4 delims==" %i in ('findstr download-link source.htm') do 
    @set match=%i
set zipurl=%match:~1,-7%
echo %zipurl%|findstr /R ^http://.*\.zip$

The attribute 'class="download-link"' exposes the tag that links to the .zip file. Using the equal sign as a delimiter, the forth token would be "http://addons.[...].zip" class. For getting rid of the surrounding quotes and the word 'class', a substring of %match% is stored in %zipurl%. The third line is somewhat optional, but can be used to check whether the script still works. Findstr sets %errorlevel% to zero if the extracted URL started with 'http://' and ended in '.zip', and sets it to one otherwise.

For use in a batch file, replace %i with %%i.

1
  • Works perfectly! I've tweaked it slightly so that it doesn't require me to save anything to a text file: for /F "tokens=4 delims==" %%i in ('curl -L http://www.curse.com/server-mods/minecraft/worldguard/download ^| findstr download-link') do @set url=%%i This will give an error saying that line 776 is too long, but it shouldn't affect what I want to do. It's crazy how short and simple this is!
    – Craft1n3ss
    Commented Apr 29, 2013 at 7:48
0

...however, I don't know how to delete lines that don't contain a specific string...

To delete lines that Do NOT contain a particular string, see this post Regular expression to match string not containing a word

There is more information in the post, and various other answers are provided, but the basics of this answer are:

You could use a combination of sed and grep (or sed and find) to filter the lines of the file.

  1. Search/replace the entire file to add a unique "Tag" to the beginning of each line that contains any text.
  2. For all lines that contain the target string, remove the unique "Tag" from the beginning of the line.
  3. At this point, all lines that begin with the unique "Tag", Do NOT contain the target string. You can now delete (or do "something else") to only those lines.
2
  • I've had a look at this and I'm not sure how to do this in a .bat file. Some of then steps I don't think can be done as simple commands.
    – Craft1n3ss
    Commented Apr 28, 2013 at 7:34
  • @Craft1n3ss - see my other answer (superuser.com/a/588845/144147) for a batch file to extract the URL. Commented Apr 28, 2013 at 18:07
0

You could do this in a few less steps using sed and grep, but here is a solution using only builtin commands.

@echo off

rem    edit next line to include your filename    
set "zzfilename=captured-page.html"

rem    get the target line
type "%zzfilename%"|find /i "data-href"|find /i ".zip">"zztarget.txt"
for /f "usebackq delims=" %%f in (`type "zztarget.txt"`) do set zzaaa=%%f

rem    change double-quotes to single-quotes
set "zzaaa1=%zzaaa:"='%"

rem    remove unneeded text from the beginning of the line
set "zzaaa2=%zzaaa1:*data-href=gotit%"

rem    remove the "<" and ">" characters
set "zzaaa3=%zzaaa2:<='%"
set "zzaaa4=%zzaaa3:>='%"

rem    from what remains, take only the desired URL
for /f "usebackq tokens=2 delims='" %%f in (`echo %zzaaa4%`) do set "zzgotit=%%f"

rem    show the work and cleanup
set zz
set "zzaaa="
set "zzaaa1="
set "zzaaa2="
set "zzaaa3="
set "zzaaa4="
del "zztarget.txt">nul 2>&1

The complete URL will be in the variable zzgotit.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .