1

I want to migrate an LMDB from my local machine to another remote machine, but there is some weirdness about the file size. According to the filesystem, an LMDB is a directory containing two files: data.mdb and lock.mdb.

The output of ls -altoh lmdb indicates that data.mdb has a file size of 4T, which matches the map_size parameter I used to create the LMDB. All this means is that when the DB is opened, the OS will memory map the file, giving it 4T of virtual space. The output of du -hs lmdb indicates that the lmdb is taking up ~900MB of disk, which agrees with the map_size reported by python -mlmdb -e lmdb stat.

When I do a local copy cp -r lmdb lmdb_copy, it works as expected: 900MB of data is copied. The same when I do scp -r lmdb lmdb_copy2 (using scp to do a local copy).

However, when I do a remote copy scp -r lmdb user@remotehost:~/lmdb_copy, scp attempts to copy 4T of data, as indicated by the progress bar. I stopped the scp after 2GB of data has been transfered.

On the remote machine, 'ls and du both 2GB as the size of the LMDB. python -mlmdb -e lmdb_copy stat reports the correct size of 900MB and that all of the entries are there. I've verified that I can print out all of the keys and they are correct.

With this background, my question is, why does scp attempt to copy all 4T of the memory map size? Ideally, I'd like to let scp do its thing in the background without having to manually kill it.

1 Answer 1

3

You could try using rsync to do the copy. It says it deals with sparse files. Something like

rsync --rsh=ssh --archive --sparse lmdb user@remotehost:~/lmdb_copy

As an aside, and some insight into why scp works locally but not over a network, when scp sees that it's a local to local copy it just passes the request to the cp command directly. Monitoring an scp command's system calls, I caught it doing this

execve("/bin/sh", ["sh", "-c", "exec cp -r foo bah"], [/* 20 vars */])
3
  • Thanks, I'll try that. I found that the mdb_copy function will locally copy the lmdb so that the file isn't sparse (ls shows the correct file size), so that the scp will work as intended.
    – waldol1
    Commented Sep 10, 2015 at 15:50
  • Hmm, it worked better, but still not what I want. A bit more than 900MB (971MB) of data got transfered (as shown by ls/du on remote machine), but rsync was still running (and reporting ridiculous transfer rates of 1000GB/s), even though the file size on the remote machine stopped increasing.
    – waldol1
    Commented Sep 18, 2015 at 14:53
  • rsync reported transfer rates attempt to show you the net rate, so would go high for sparse files (just as it would for "no change" sections). Did you let rsync finish?
    – mykel
    Commented Sep 20, 2015 at 12:37

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .