217

I am trying to create a directory that will house all and only my PDFs compiled from LaTeX. I like keeping each project in a separate folder, all housed in a big folder called LaTeX. So I tried running:

rsync -avn *.pdf ~/LaTeX/ ~/Output/

which should find all the pdfs in ~/LaTeX/ and transfer them to the output folder. This doesn't work. It tells me it's found no matches for "*.pdf". If I leave out this filter, the command lists all the files in all the project folders under LaTeX. So it's a problem with the *.pdf filter. I tried replacing ~/ with the full path to my home directory, but that didn't have an effect.

I'm, using zsh. I tried doing the same thing in bash and even with the filter that listed every single file in every subdirectory... What's going on here?

Why isn't rsync understanding my pdf only filter?


OK. So update: No I'm trying

rsync -avn --include="*/" --include="*.pdf" LaTeX/ Output/

And this gives me the whole file list. I guess because everything matches the first pattern...

2

11 Answers 11

398
+50

TL,DR:

rsync -am --include='*.pdf' --include='*/' --exclude='*' ~/LaTeX/ ~/Output/

Rsync copies the source(s) to the destination. If you pass *.pdf as sources, the shell expands this to the list of files with the .pdf extension in the current directory. No recursive traversal happens because you didn't pass any directory as a source.

So you need to run rsync -a ~/LaTeX/ ~/Output/, but with a filter to tell rsync to copy .pdf files only. Rsync's filter rules can seem daunting when you read the manual, but you can construct many examples with just a few simple rules.

  • Inclusions and exclusions:

    • Excluding files by name or by location is easy: --exclude=*~, --exclude=/some/relative/location (relative to the source argument, e.g. this excludes ~/LaTeX/some/relative/location).
    • If you only want to match a few files or locations, include them, include every directory leading to them (for example with --include=*/), then exclude the rest with --exclude='*'. This is because:
    • If you exclude a directory, this excludes everything below it. The excluded files won't be considered at all.
    • If you include a directory, this doesn't automatically include its contents. In recent versions, --include='directory/***' will do that.
    • For each file, the first matching rule applies (and anything never matched is included).
  • Patterns:

    • If a pattern doesn't contain a /, it applies to the file name sans directory.
    • If a pattern ends with /, it applies to directories only.
    • If a pattern starts with /, it applies to the whole path from the directory that was passed as an argument to rsync.
    • * any substring of a single directory component (i.e. never matches /); ** matches any path substring.
  • If a source argument ends with a /, its contents are copied (rsync -r a/ b creates b/foo for every a/foo). Otherwise the directory itself is copied (rsync -r a b creates b/a).


Thus here we need to include *.pdf, include directories containing them, and exclude everything else.

rsync -a --include='*.pdf' --include='*/' --exclude='*' ~/LaTeX/ ~/Output/

Note that this copies all directories, even the ones that contain no matching file or subdirectory containing one. This can be avoided with the --prune-empty-dirs option (it's not a universal solution since you then can't copy a directory even by matching it explicitly, but that's a rare requirement).

rsync -am --include='*.pdf' --include='*/' --exclude='*' ~/LaTeX/ ~/Output/
4
  • In contrast to my solution (using zsh's ** pattern), this recreates the directory structure in the target dir. I'm not sure whether this is what the OP wants... Commented Sep 29, 2010 at 12:08
  • @Michael I can't reproduce this. I just tried it with rsync 3.1.1 and rsync 3.1.3, between local directories, and it only lists maching files that are missing or different on the destination (and their directories). Commented Sep 17, 2020 at 16:42
  • Why the -m option?
    – a06e
    Commented Nov 20, 2022 at 20:02
  • @becko Because rsync has to traverse and copy all directories in order to find and copy all the .pdf files. The -m option tells it not to copy directory subtrees that turn out not to have any .pdf file to copy. Commented Nov 21, 2022 at 9:36
54
rsync -av --include="*/" --include="*.pdf" --exclude="*" ~/Latex/ ~/Output/ --dry-run

The default is to include everything, so you must explicitly exclude everything after including the files you want to transfer. Remove the --dry-run to actually transfer the files.

If you start off with:

--exclude '*' --include '*.pdf'

Then the greedy matching will exclude everything right off.

If you try:

--include '*.pdf' --exclude '*' 

Then only pdf files in the top level folder will be transferred. It won't follow any directories, since those are excluded by '*'.

3
  • 7
    As of 2014-03-17 this is the best answer, as it solves the original posters question exactly. Please vote it up! If you add --prune-empty-dirs (or shortcut -m) you even spare yourself many empty directories at the destination, except of course you want them as a reminder or structural blueprint.
    – porg
    Commented Mar 17, 2014 at 23:38
  • 4
    Best answer, --include="*/" is key. Commented Aug 5, 2015 at 11:46
  • It didn't work for me.
    – Felipe
    Commented Apr 8, 2021 at 4:13
14

If you use a pattern like *.pdf, the shell “expands“ that pattern, i.e. it replaces the pattern with all matches in the current directory. The command you are running (in this case rsync) is unaware of the fact that you tried to use a pattern.

When you are using zsh, there is an easy solution, though: The ** pattern can be used to match folders recursively. Try this:

rsync -avn ~/LaTeX/**/*.pdf ~/Output/
6
  • Wouldn't that copy all pdfs from somewhere within the current directory and everything from ~/LaTeX/ to ~/Output?
    – SamB
    Commented Sep 16, 2010 at 18:35
  • I guess you meant rsync -avn ~/LaTeX/**/*.pdf ~/Output, but the solution with --include is more scalable anyway. Commented Sep 16, 2010 at 18:58
  • Sorry, corrected the command I mistyped in a rush... I agree that the include command (in SamB's version) is better, though it is a bit more complicated and specific to rsync while the ** might become handy in other situations as well. Commented Sep 16, 2010 at 19:10
  • 1
    Bash 4 has adopted the same feature. Oh, and you don't need rsync here, cp will do. On some systems, if there are a lot of files, it helps to do cd ~/Latex && cp -p **/*.pdf ~/Output to avoid a “command line too long” error. Commented Sep 29, 2010 at 17:34
  • 1
    Note that rsync's patterns used in the include and exclude filters also have a ** that does the same thing. You can escape *'s from other shells by putting them in quotation marks.
    – Dan Pritts
    Commented Feb 4, 2015 at 16:28
13

You can use find and an intermediate list of files (files_to_copy) to solve your issue. Make sure you're in your home directory, then:

find LaTeX/ -type f -a -iname "*.pdf" > files_to_copy && rsync -avn --files-from=files_to_copy ~/ ~/Output/ && rm files_to_copy

Tested with Bash.

4
  • I think that find is the most robust solution, but I would opt for either using finds -exec option or using xargs. Something like: find LaTeX/ -type f -iname "*.pdf" -print0 | xargs -0 -i rsync -avn {} Output/
    – Steven D
    Commented Sep 27, 2010 at 17:09
  • Yeah... I'd suggest find as well... though I imagine rsync must be able to do this.
    – gabe.
    Commented Sep 28, 2010 at 19:49
  • This is a neat solution to a harder problem as well: presumably I could use this to exclude files whose document class is standalone or which don't have a .tex file with the same name, since these will be images included in some document...
    – Seamus
    Commented Sep 29, 2010 at 12:41
  • 2
    rsync option --files-from accepts reading from stdin. This would work find LaTeX/ -type f -a -iname "*.pdf" | rsync -avn --files-from=- ~/ ~/Output/ Commented Sep 20, 2012 at 16:15
8

Judging by the "INCLUDE/EXCLUDE PATTERN RULES" section of the manpage, the way to do this is

rsync -avn --include="*/" --include="*.pdf" ~/Latex/ ~/Output/

The critical difference between this and kbrd's answer is the --include="*/" flag, which tells rsync to go ahead and copy any directories it finds, whatever they are named. This is needed because rsync will not recurse into a subdirectory unless it has been instructed to copy that subdirectory.

Also, note that the quotation marks prevent the shell from trying to expand the patterns to filenames relative to the current directory, and doing one of the following:

  1. Succeeding and messing up your filter (not too likely in the middle of a flag like that, though you really never know when someone will make a file named --include=foo.pdf ...)

  2. Failing, and potentially producing an error instead of running the command (as you've discovered zsh does by default).

4
  • So this will copy only the PDFs and the directory structure, while kbrd's will copy the files, but ignore the structure?
    – Seamus
    Commented Sep 17, 2010 at 9:29
  • 1
    Hmm. This actually still seems to try and copy everything, I guess because that's what it does without the filter, so includeing extra stuff already in there doesn't change anything. If you see what I mean...
    – Seamus
    Commented Sep 17, 2010 at 9:33
  • 8
    You need --exclude="*" after the --include="*.pdf", or this will transfer everything.
    – jmanning2k
    Commented Sep 28, 2010 at 20:25
  • @jmanning2k: Ah. Good to know!
    – SamB
    Commented Sep 29, 2010 at 21:18
6

This is my preferred solution:

find source_dir -iname '*.jpg' -print0 |  rsync -0 -v --files-from=- . destination_dir/

The find command is easier to understand than the include/exclude rules of rsync :-)

If you want to copy only pdf files, just change .jpg to .pdf

3

How about this:

rsync -avn --include="*.pdf" ~/Latex/ ~/Output/
4
  • 1
    No, man rsync puts the filter after the options and before the source/destiinations. I tried this and it didn't work
    – Seamus
    Commented Sep 16, 2010 at 16:06
  • 1
    Your way finds .pdf files in the current folder, but not recursively, as I want. (the a option is for archive and among other things it makes the copying recursive.
    – Seamus
    Commented Sep 16, 2010 at 16:07
  • 1
    Ooops, my bad. I updated my answer.
    – kbyrd
    Commented Sep 16, 2010 at 16:43
  • +1 for being so close, and giving me a clue about how to find the relevant material in the manual page. (Hopefully I even got it right. :-)
    – SamB
    Commented Sep 16, 2010 at 19:04
1

Here is something that should work without using find. The difference from answers already posted is the order of the filter rules. Filter rules in an rsync command work a lot like iptable rules, the first rule that a file matches is the one that is used. From the manual page:

As the list of files/directories to transfer is built, rsync checks each name to be transferred against the list of include/exclude patterns in turn, and the first matching pattern is acted on: if it is an exclude pattern, then that file is skipped; if it is an include pattern then that filename is not skipped; if no matching pattern is found, then the filename is not skipped.

Thus, you need a command as follows:

rsync -avn --include="**.pdf" --exclude="*" ~/LaTeX/ ~/Output/

Note the "**.pdf" pattern. According to the man page:

if the pattern contains a / (not counting a trailing /) or a "**", then it is matched against the full pathname, including any leading directories. If the pattern doesn’t contain a / or a "**", then it is matched only against the final component of the filename. (Remember that the algorithm is applied recursively so "full filename" can actually be any portion of a path from the starting directory on down

In my small test, this does work recursively down the directory tree and only selects the pdfs.

4
  • 1
    How exactly did you test? According to my understanding of the documentation and my experimental verification, your command should only copy *.pdf in the toplevel directory (but not ~/LaTeX/foo/bar.pdf). Commented Sep 28, 2010 at 19:25
  • 1
    @Gilles Crud. You are right. I swore I tested this and it worked, but I can't seem to recreate it. And now that I actually read the man page that I quoted, it makes sense that it doesn't work. Grumble.
    – Steven D
    Commented Sep 28, 2010 at 20:10
  • 1
    Well, I figured out where my test was wrong. My "small test" was on a directory that has .tex and .pdf files of my own. I then created a "test" subdirectory and a test.pdf and test.tex in that subdir. However, I failed to notice that there was a test.pdf in my top level dir, likely because of some quick one of LaTeX experiment I did.
    – Steven D
    Commented Sep 28, 2010 at 20:14
  • I still don't understand the **. Would be nice to have example of it. ;)
    – buhtz
    Commented Oct 6, 2017 at 9:59
1

In an update to @Giles' answer, please consider that the order of the include and exclude commands must be changed with current versions (>=3.x.x) to have the include options before the exlude options in order to build the correct file list. It is also my personal best practice to put the "include all subdirectories" instruction generally first and then the file pattern:

rsync -avh --include='*/' --include='file-pattern' --exclude='*' /sourcedir/ /targetdir/

i.e. in your case:

rsync -avh --include='*/' -include='*.pdf' --exclude='*' ~/LaTeX/ ~/Output/

Further explanation can also be drawn from the manual at https://www.samba.org/ftp/rsync/rsync.html under the headline "FILTER RULES":

Note that, when using the --recursive (-r) option (which is implied by -a), every subdir component of every path is visited left to right, with each directory having a chance for exclusion before its content. In this way include/exclude patterns are applied recursively to the pathname of each node in the filesystem's tree (those inside the transfer). The exclude patterns short-circuit the directory traversal stage as rsync finds the files to send.

For instance, to include "/foo/bar/baz", the directories "/foo" and "/foo/bar" must not be excluded. Excluding one of those parent directories prevents the examination of its content, cutting off rsync's recursion into those paths and rendering the include for "/foo/bar/baz" ineffectual (since rsync can't match something it never sees in the cut-off section of the directory hierarchy).

The concept path exclusion is particularly important when using a trailing '*' rule. For instance, this won't work:

+ /some/path/this-file-will-not-be-found
+ /file-is-included
- *

This fails because the parent directory "some" is excluded by the '*' rule, so rsync never visits any of the files in the "some" or "some/path" directories. One solution is to ask for all directories in the hierarchy to be included by using a single rule: "+ */" (put it somewhere before the "- *" rule), and perhaps use the --prune-empty-dirs option. Another solution is to add specific include rules for all the parent dirs that need to be visited. For instance, this set of rules works fine:

+ /some/
+ /some/path/
+ /some/path/this-file-is-found
+ /file-also-included
- *

Here are some examples of exclude/include matching:

"- *.o" would exclude all names matching *.o
"- /foo" would exclude a file (or directory) named foo in the transfer-root directory
"- foo/" would exclude any directory named foo
"- /foo/*/bar" would exclude any file named bar which is at two levels below a directory named foo in the transfer-root directory
"- /foo/**/bar" would exclude any file named bar two or more levels below a directory named foo in the transfer-root directory
The combination of "+ */", "+ *.c", and "- *" would include all directories and C source files but nothing else (see also the --prune-empty-dirs option)
The combination of "+ foo/", "+ foo/bar.c", and "- *" would include only the foo directory and foo/bar.c (the foo directory must be explicitly included or it would be excluded by the "*")

The following modifiers are accepted after a "+" or "-":

A / specifies that the include/exclude rule should be matched against the absolute pathname of the current item. For example, "-/ /etc/passwd" would exclude the passwd file any time the transfer was sending files from the "/etc" directory, and "-⁠/ subdir/foo" would always exclude "foo" when it is in a dir named "subdir", even if "foo" is at the root of the current transfer.
A ! specifies that the include/exclude should take effect if the pattern fails to match. For instance, "-! */" would exclude all non-directories.
A C is used to indicate that all the global CVS-exclude rules should be inserted as excludes in place of the "-⁠C". No arg should follow.
An s is used to indicate that the rule applies to the sending side. When a rule affects the sending side, it prevents files from being transferred. The default is for a rule to affect both sides unless --delete-excluded was specified, in which case default rules become sender-side only. See also the hide (H) and show (S) rules, which are an alternate way to specify sending-side includes/excludes.
An r is used to indicate that the rule applies to the receiving side. When a rule affects the receiving side, it prevents files from being deleted. See the s modifier for more info. See also the protect (P) and risk (R) rules, which are an alternate way to specify receiver-side includes/excludes.
A p indicates that a rule is perishable, meaning that it is ignored in directories that are being deleted. For instance, the -C option's default rules that exclude things like "CVS" and "*.o" are marked as perishable, and will not prevent a directory that was removed on the source from being deleted on the destination.
An x indicates that a rule affects xattr names in xattr copy/delete operations (and is thus ignored when matching file/dir names). If no xattr-matching rules are specified, a default xattr filtering rule is used (see the --xattrs option).
0

To generate a directory containing only headers (../include) from inside the source directory:

rsync -avh --prune-empty-dirs --exclude="build" --include="*/" --include="*.h" --exclude="*" ./* ../include/

This excludes all empty directories and the directory build

0

For those who want a solution that does not copy the original directly structure (ie dumps all pdf's into one directory). This should work:

find SRC_DIR/ -type f | grep *.pdf | xargs -i cp {} DEST_DIR

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .