2

We have a process that runs nightly on multiple linux and Darwin (Macintosh) systems to backup filesystem objects. We are also running unity file synchronization with the -repeat=watch option, so that synchronization is as up-to-date as reasonably possible.

Our goal is to gracefully shutdown the unison processes before the backup window, and restart them after. Currently, we use the relatively ungraceful method of kill -SIGTERM on the system that started each synchronization process. While this works, it's not ideal and I hope there's an orderly way to do this that I might have missed. (One has to pore through the docs, for example, to find that SIGUSR1 can be used to close a logfile, so I hope similarly there's another SIG* that can be used for this purpose.

It would also be nice to incorporate this into our system-shutdown process.

Can this currently be done (MAC version is "unison version 2.53.3 (ocaml 4.14.0)"; unix (UBUNTU) version is "unison version 2.48.4".) If there is no graceful way, I will make the feature request.

1 Answer 1

0

Per the unision manual, the polite stop watching signal is SIGUSR2.

6.10 Interrupting a Synchronization

When not synchronizing continuously, the text interface terminates when synchronization is finished normally or due to a fatal error occurring. In the text interface, to interrupt synchronization before it is finished, press “Ctrl-C” (or send signal SIGINT or SIGTERM). This will interrupt update propagation as quickly as possible but still complete proper cleanup. If the process does not stop even after pressing “Ctrl-C” then keep doing it repeatedly. This will bypass cleanup procedures and terminates the process forcibly (similar to SIGKILL). Doing so may leave the archives or replicas in an inconsistent state or locked. When synchronizing continuously (time interval repeat or with filesystem monitoring), interrupting with “Ctrl-C” or with signal SIGINT or SIGTERM works the same way as described above and will additionally stop the continuous process. To stop only the continuous process and let the last synchronization complete normally, send signal SIGUSR2 instead.

It has a SIGTERM handler to exit gracefully. In general with file related tools, cleaning up would mean removing temporary files whose copy had not finished, plus unlocking any resources.

Implement and test a script that sends SIGUSR2, waits for a bit, and if still running sends SIGTERM. For example, if using a systemd on Linux service, the ExecStop script could send this signal, wait a bit, and check the status of syncing. Then the usual TERM signal from the service manager would ensure it shut down. Confirm your sets of files are intact. The sync program being crash consistent does not necessarily mean your files are all at the same point in time.


Seems like you have a backup method independent of the sync process, which is good.

I do not consider a sync like this alone to be a sufficient backup. Good backups are offline, immutable, and to different media. If accident or malware were to write bad data to one side, it would be synced to the other. And the sync conflict resolution could introduce complications, versus merely making an archive out of one tree.

An independent means of backup also provides assurances that your changes to the file sync procedure does not cause data loss. Worst case, grab backup archives, safely protected from writes.

1
  • Ah, looks like they added SIGUSR2 sometime after 2.48, which is my biggest concern since only one system is running that mac version. Perviously, I tried SIGTERM and it was messy, leaving a "terminated" message, and not dealing with children and the like. It left a bit of a cleanup challenge behind. I have now built unison 2.53.2 (ocaml) from the source, and this seems to be working correctly. Thanks so much for the thorough answer. Yes, agreed, unison copy does not equal backup. We have a true offsite backup solution in order to deal with recovery issues.
    – Dennis
    Commented Nov 14, 2023 at 20:08

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .