0

I synchronize some folders and do backups via attic every night to an nfs share, that is at a different site. For this, I have a script that first establishes an openvpn connection to that site and then mounts the nfs share before the backup is started.

It seldom happens that the nfs share becomes unavailable during the backup process, which causes the io wait to climb over days:

Imgur average load graph

As soon as the share becomes available again, the load drops.

Before that I can't kill the process that causes the load. It just won't go away.

This is very annoying.

How can I prevent this from happening? Can I somehow integrate a timeout or something?

Here is the script that runs every night via cron:

#!/bin/sh
REPOSITORY=/media/offsiteserver_netbackup/system.attic  #no backslash at the end of this
NFSMOUNT=/media/offsiteserver_netbackup  #no backslash at the end of this
NFSDIR="192.168.178.2:disk2/netbackup"

export PATH=$PATH:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11:/usr/games

###############start of script#################   
#exec >> $LOGFILE 2>&1

#simple function that just prints the time and the info you pass to it
echotime () {
  echo "`date +%Y-%m-%d--%H:%M:%S` ----$1---"
}

# simple function to check if openvpn is connected (1 means NOT CONNECTED)
checkvpn () {
  if ping 192.168.178.2 -c 1 &> /dev/null; then
      echotime "VPN connected"
      return 1
  else
      echotime "VPN not connected"
      return 0
  fi
}  

# simple function to check if nfs is mounted (1 means NOT MOUNTED)
checkmount () {
  #http://stackoverflow.com/a/14698865
  #http://stackoverflow.com/a/9422947
  if mount | grep $NFSMOUNT > /dev/null; then
      echotime "NFS mounted"
      return 1
  else
      echotime "NFS not mounted"
      return 0
  fi
}

echotime "Script Start"

# restart vpn if not connected
if checkvpn; then
  echotime "VPN not connected, attepting to connect"
  /etc/init.d/openvpn restart
  sleep 5
  #check again if its connected
  if checkvpn; then
     echotime "ERROR: VPN still not connected, exiting \n"
     exit 1
  fi
fi

# mount nfs if not mounted
# if your not using NFS, you can delete this section all together
if checkmount; then
  echotime "NFS not mounted, attepting to mount"
  mount -v $NFSDIR $NFSMOUNT -o nolock
  #check again if its mounted
  if checkmount; then
     echotime "ERROR: NFS still not mounted, exiting \n"
     exit 1
  fi
fi

# Backup all of / except a few excluded directories
# if your running into issues, add -v after create for verbose mode
# the below / means backup all of root.
echotime "ATTIC CREATE"
attic create --stats                            \
    $REPOSITORY::host.stscode-`date +%Y-%m-%d--%H:%M:%S`  \
    /                                           \
    --exclude /sys                              \
    --exclude /mnt                              \
    --exclude /dev                              \
    --exclude /media                            \
    --exclude /lost+found                       \
    --exclude /proc                             \
    --exclude /run

# Use the `prune` subcommand to maintain 7 daily, 4 weekly
# and 6 monthly archives.
echotime "ATTIC PRUNE"
attic prune -v $REPOSITORY --keep-hourly=23 --keep-daily=7 --keep-weekly=2 --keep-monthly=2

#unmount the NFS folder, I do this b/c
#   if it stays mounted, sometimes servers freak
#   out when rebooting.
# Uncomment the below 2 lines if you need to unmount every time.
echotime "UNMOUNT"
umount -v $NFSMOUNT

# end of script
echotime "End of Script"

Or maybe nfs is not the way to go here?

I am grateful for any hints on how I can improve this procedure and make it more stable.

2 Answers 2

0

This is a "feature" of NFS. When a connection is lost it will try indefinitely to reconnect and, if it's able to, is generally pretty good about picking up where it left off as if nothing had happened. It can be incredibly annoying, however. The only way to avoid this is to not use NFS. If you don't care about performance (or UNIX file attrs) you could use CIFS/SMB. rsync would also be an option although it might not work very well with attic.

0

A trivial solution would be to start attic as a sub-process and abort it if it takes too long. Some suggestions to do this are in this post

If aborting doesn't work (as you've mentioned in your question), you can still run it as a a child process and have your main application monitor NFS.
If NFS stops working, you might be able to restart or remount it. (You've got the code for monitoring and restarting already in your script.)

With a little bit of luck, your process will continue when NFS is restarted.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .