4

I have a setup in which I sync files between two Linux servers using Unison over an SSH connection. This is implemented by running the following command via cron:

unison -auto -batch PROFILE

Changes to the fileset happen almost exclusively on one side (the other system is an off-site replica). They mostly take place through another Unison sync run (triggered manually) with a client machine, and can be from a few hours to a few weeks apart. Thus, conflicts between the two machine are not much of a concern in practice, and a delay of up to 24 hours for changes to get propagated to the other side is acceptable.

Reasons I am running Unison as a cron job rather than with -repeat (presumably as a systemd service) are:

  • Predictable sync times, as the cron job is scheduled at a time when I am not expecting any manual sync operations from the third machine (whereas, say, -repeat 86400 would drift by the duration of the sync operation).
  • Changes mostly happen on server A, while the server-to-server sync job is triggered by server B (is it is easier network-wise if server B initiates the connection). Thus, as I understand, -repeat watch would not pick up most of the changes and even with -repeat watch+TIME, I’d be relying on TIME almost exclusively (correct me if I missed something).

When changes do happen, they are usually low in volume. Occasionally, however, the data volume to be transferred is such that a single Unison run lasts several times as long as the interval between two Unison cron jobs (bandwidth between the systems is somewhat constrained). That would mean one Unison process is still running when cron launches the next one on the same set of files.

I take it that Unison has lock mechanisms in place, which presumably prevent the “new” process from messing with anything the “old” one is working on (but correct me if I’m wring or have missed something). But I’m wondering what the second Unison process would do in that case – I have observed it does not exit but stays around. Does that mean the second process would wait for the first to finish and only then start synchronization (which would then only include files that changed while the first sync was in progress and therefore failed to sync on the first run)?

Is it safe to launch a second Unison process while another one is still running on the same profile? (If not, what is a recommended way to prevent two concurrent Unison instances if, and only if, they are at risk of interfering with each other?)

What about the resource overhead of unison -repeat wait+TIME vs. occasionally having multiple Unison instances queued up, one running and the others waiting for it to finish?

1
  • Just a thought, but had you considered rclone instead of unison? The syntax is different and it would need a bit of manual configuration, but it supports essentially the same type of synchronization using it’s SFTP backend, and it also has options to limit how long a sync operation can run for (or how much data to transfer at a time), which would let you completely sidestep this issue (because you could limit it to only running for, say, 20 hours at a time, and therefore could be certain that a previous cron job isn’t still running when a new one starts). Commented May 16 at 14:24

4 Answers 4

11

If you want to avoid launching a second instance of unison when another process is already running, you can control startup using a lockfile via the flock command. This is a general solution, it works for "prevent a second instance of from running when another instance is already running".

If you want startup to fail when another instance is running:

flock -n /path/to/lockfile unison -auto -batch PROFILE

If you want startup to block until the existing process exits:

flock /path/to/lockfile unison -auto -batch PROFILE

See the flock(1) man page for more information.

3
  • Some questions about the inner workings of Unison still remain, but this is definitely a promising candidate for a workaround. I’ll test it.
    – user149408
    Commented May 15 at 17:57
  • 1
    As an addendum, by adding -w X in the second case, the subsequent job can be made to wait up to X seconds for the first job to finish and fail if the first job is not done within that time, which may be preferred to waiting indefinitely for the previous job to finish. Commented May 16 at 14:19
  • @AustinHemmelgarn For the moment, flock -n ... || logger -t unison "skipping because lock file" should work just fine (we’ll see tomorrow if stopping any further instances actually works). Any changes made during that time will get picked up by the next sync job after the first one finishes.
    – user149408
    Commented May 16 at 20:23
5

Unison (version 2.51.2) creates its own lock files. Testing it just now, running unison sync twice from the command line, the second while the first was still running, I got this error (with stuff ANONYMIZED):

Failed: Destination updated during synchronization
The contents of file FOO/BAR.text have been modified
Failed [FOO/BAR.text]: Destination updated during synchronization
The contents of file foo/bar.text have been modified
UNISON 2.51.2 (OCAML 4.05.0) finished propagating changes at 17:08:02.32 on 15 May 2024
Saving synchronizer state
Fatal error: Warning: the archives are locked.  
If no other instance of unison is running, the locks should be removed.
The file /HOME/ME/.unison/GIANTLONGHASHGIANTLONGHASHBIGYIKES on host LOCALHOST should be deleted
The file /WHEREVER/.unison/GIANTLONGHASHGIANTLONGHASHBIGYIKES on host REMOTEHOST should be deleted
Please delete lock files as appropriate and try again.

So it appears the second instance of unison will exit fatally if it detects these lock files and surmises that another unison sync is currently running.

1
  • 1
    This looks to me as if the second instance initially attempted to sync, skipping one file due to it having been modified during sync, and only then trying to update archives and failing because archives are locked. This may need a large set of files to be noticeable (in my case, around 1 million files, 1 TB in size, about 25,000 files/80 GB changed, over a 3 Mbit network). Besides, the only lock files I am seeing (named lk followed by a 128-bit hash) are 9 months old on both sides, even while files are being transferred (on Unison 2.52.1).
    – user149408
    Commented May 16 at 18:47
2

Reasons I am running Unison as a cron job rather than with -repeat (presumably as a systemd service)

I would actually suggest that systemd is a perfect fit for when you want "cron but single instance". You don't need to keep it as a long-running service - you can use systemd timers in a cron-like manner while taking advantage of systemd to manage instances.

  1. Define a systemd service of Type=exec. This service will be considered active until the process exits. Use the same command line you currently use with cron.
  2. Define a systemd timer to start the service previously defined. This can be set to happen either at a clock (calendar) time (OnCalendar=), or after a period (OnUnitActiveSec= would be relative to last service start time, OnUnitInactiveSec= would be relative to last service stop time). Personally, I like calendar since it's more predictable and won't drift.
  3. Enable and start the timer. Don't enable the service, you don't want it running independently of the timer.
  4. The timer will start the service on schedule. The service manager will ensure only one instance of the service is active. If the service is still running when the timer next fires, it will simply not do anything.

Sample unit files to run Unison as SOMEUSER with profile SOMEPROFILE (except for their .service and .timer suffixes, both unit files must have the same name):

unison.service:

[Unit]
Description=Unison SOMEUSER@SOMEPROFILE
After=network.target

[Service]
User=SOMEUSER
# Type=simple may be used on older versions
Type=exec
# may be relative to the service's root directory specified by RootDirectory=, or the special value "~"
WorkingDirectory=~
# systemd may insist on an absolute path for the executable
# Run unison with -terse to prevent cluttering the system journal
ExecStart=/usr/bin/unison -auto -batch -ui text -terse SOMEPROFILE
Restart=no

unison.timer (fires daily at 13:37; for OnCalendar syntax see man 5 systemd.time, section Calendar Events):

[Unit]
Description=Daily unison SOMEUSER@SOMEPROFILE

[Timer]
OnCalendar=*-*-* 13:37

[Install]
WantedBy=timers.target

Drop these two unit files into /etc/systemd/system/, then run:

sudo systemctl enable unison.timer
sudo systemctl start unison.timer

You can test the setup by running sudo systemctl start unison.service – this requires only the service unit, not the timer unit. This is also a way to start sync out of schedule. Unison output will be written to the system journal with unison as a tag.

9
  • This is a very elegant solution, and definitely what I would do. You can even use user units if you'd prefer to not run as root.
    – onwsk8r
    Commented May 17 at 17:41
  • Looks very elegant indeed, and gives me the additional benefit of being able to start synchronization manually when I need it, without having to worry about collisions with the next scheduled run. Running as a non-root user is pretty much mandatory with Unison, though – it is not intended to run as root.
    – user149408
    Commented May 17 at 18:29
  • @user149408 In that case you could specify User= in the service unit with a username or uid, and it will start the service as that user. Or as @onwsk8r mentioned, you can define both units (service and timer) entirely under a user if you prefer, though how exactly that's configured (and whether it starts on boot vs login) would depend on your distro. Personally I just define all my services as system services and configure them to run as a user if necessary.
    – Bob
    Commented May 18 at 2:30
  • 1
    @user149408 I'm not sure WorkingDirectory can take a ~ alias, you probably need the absolute path. IIRC it's normally recommended that ExecStart uses an absolute path too, though not strictly required.
    – Bob
    Commented May 18 at 15:37
  • 1
    @user149408 I suspect you may have meant simple, since single does not appear to exist. That said, exec is a better default if possible, since it has better error handling. Good catch on the WorkingDirectory
    – Bob
    Commented May 23 at 0:10
0

I also use flock in cron job shell scripts to avoid running more than one at a time. I use a different approach than has been suggested in a previous answer. I'll show the code and describe the benefits/drawbacks after.

Bash code snippet, might be usable in other shells with minor changes:

script_name='my-cron-command'
lock_file="/tmp/${script_name}.lock"

# this flock solution uses a subshell
(
  # get a lock on subshell's file descriptor 300 (see below)
    flock -x -n 300 || {
    echo "${script name}: Warning: cannot get lock on file ${lock_file}.  Exiting..." >&2
    exit 0
  }

  # lock was successful, okay to run my command

  ### command here ###

  # subshell's file descriptor 300 is open on the lockfile
) 300>"${lock_file}"

Drawbacks:

  • The script won't wait until the lock is released by a previous script.
  • Invokes an extra subshell process.

Benefits:

  • One hung cron job won't cause later jobs to accumulate waiting for the lock, growing the list of processes, and consuming ram.
  • The lock is released when the subshell exits. No need for a command to unlock the file and no stale locks left behind by crashed commands/scripts. (a hung command can be found with fuser /path/to/lock.file)
  • When the lockfile is already locked, the script can log a meaningful message before it exits.
  • Not limited to locking just one command. The script can perform other file/variable manipulation between obtaining the lock and running the command.

File descriptor 300 isn't special, it's just a high-numbered descriptor that's not likely to be used by another part of the script.

As with everything else, it's a trade-off between the behaviors you want and the ones you don't want.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .