5

I installed mysql and server via yum on my 64bit CentOS 5 server. It starts up fine but when I try to stop it it stalls out and then I have to "Ctrl-C" it. Then I run a "service mysqld status" and it shows:

mysqld dead but subsys locked

I run ps aux and mysql is nowhere to be found. Starting mysqld again via "service mysqld start" works fine. Attempting to stop it creates the same problem.

I then realized that /var/lock/subsys/mysqld still exist. When running mysqld I checked /var/run/mysqld/mysqld.pid and it matched with the pid of the running service.

I tried reinstalling mysql and deleting all files and configs but to no avail.

What to do?

EDIT:

I added some echo statements in the /etc/init.d/mysqld file, particularly in the stop function:

stop(){
        if [ ! -f "$mypidfile" ]; then
            # not running; per LSB standards this is "ok"
            action $"Stopping $prog: " /bin/true
            return 0
        fi  
        echo "beginning stop sequence"
        MYSQLPID=`cat "$mypidfile"`
        if [ -n "$MYSQLPID" ]; then
            /bin/kill "$MYSQLPID" >/dev/null 2>&1
            echo "killing pid $MYSQLPID"
            ret=$?
            if [ $ret -eq 0 ]; then
                echo "return code $ret after kill attempt"
                TIMEOUT="$STOPTIMEOUT"
                echo "timeout is set to $STOPTIMEOUT"
                while [ $TIMEOUT -gt 0 ]; do
                    /bin/kill -0 "$MYSQLPID" >/dev/null 2>&1 || break
                    sleep 1
                    let TIMEOUT=${TIMEOUT}-1
                    echo "timeout is now $TIMEOUT"
                done
                if [ $TIMEOUT -eq 0 ]; then
                    echo "Timeout error occurred trying to stop MySQL Daemon."
                    ret=1
                    action $"Stopping $prog: " /bin/false
                else
                    echo "attempting to del lockfile: $lockfile"
                    rm -f $lockfile
                    rm -f "$socketfile"
                    action $"Stopping $prog: " /bin/true
                fi
            else
                action $"Stopping $prog: " /bin/false
            fi
        else
            # failed to read pidfile, probably insufficient permissions
            action $"Stopping $prog: " /bin/false
            ret=4
        fi
        return $ret
}

This is the result I get when I try to stop the service:

[root@server]# service mysqld stop
beginning stop sequence
killing pid 9145
return code 0 after kill attempt
timeout is set to 60
timeout is now 59
timeout is now 58
timeout is now 57
timeout is now 56
timeout is now 55
timeout is now 54
timeout is now 53
timeout is now 52
timeout is now 51
timeout is now 50
timeout is now 49

From looking at the code it seems to me that it will never break out of that while loop and will not be able to delete the lock file. Am I interpreting this wrong? I checked the same file on my other server and it uses the same code. I am dumbfounded.

EDIT: In the while loop part

 /bin/kill -0 "$MYSQLPID" >/dev/null 2>&1 || break

For some reason it is not recognizing the return code. When service mysqld stop is called, the process has already been killed but not sure why it isn't allowing the loop to break out.

EDIT: Further testing shows some weird behavior between calling /bin/kill and just calling kill, they apparently return different codes, why??????:

[root@server]# /bin/kill 25200
kill 25200: No such process
[user@server]# echo ${?}
0
[root@server]# kill 25200
-bash: kill: (25200) - No such process
[root@server]# echo ${?}
1

EDIT: I logged on as a non-root user and tried executing "kill" and "/bin/kill" with a surprising results:

[notroot@server ~]$ kill -0 23232
-bash: kill: (23232) - No such process
[notroot@server ~]$ echo $?
1
[notroot@server ~]$ /bin/kill -0 23232
kill 23232: No such process
(No info could be read for "-p": geteuid()=501 but you should be root.)
[notroot@server ~]$ echo $?
0

The "No info could be read" error does not show up in my other servers when executing kill and bin/kill as a nonroot user.

EDIT: Added logging described by quanta and also checked out the mysql log:

After a start and stop, mysql log shows this:

110918 00:11:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
110918  0:11:28 [Note] Plugin 'FEDERATED' is disabled.
110918  0:11:28  InnoDB: Initializing buffer pool, size = 16.0M
110918  0:11:28  InnoDB: Completed initialization of buffer pool
110918  0:11:29  InnoDB: Started; log sequence number 0 44233
110918  0:11:29 [Warning] 'user' entry 'root@server' ignored in --skip-name-resolve mode.
110918  0:11:29 [Warning] 'user' entry '@server' ignored in --skip-name-resolve mode.
110918  0:11:29 [Note] Event Scheduler: Loaded 0 events
110918  0:11:29 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.58-ius'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Distributed by The IUS Community Project
110918  0:11:34 [Note] /usr/libexec/mysqld: Normal shutdown

110918  0:11:34 [Note] Event Scheduler: Purging the queue. 0 events
110918  0:11:34  InnoDB: Starting shutdown...
110918  0:11:39  InnoDB: Shutdown completed; log sequence number 0 44233
110918  0:11:39 [Note] /usr/libexec/mysqld: Shutdown complete

110918 00:11:39 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

Then in the tmp/mysql.log:

kill 23080: No such process
kill 23080: No such process
kill 23080: No such process
kill 23080: No such process
kill 23080: No such process
kill 23080: No such process
kill 23080: No such process
kill 23080: No such process
kill 23080: No such process
kill 23080: No such process

I stopped the stop process midway so I don't have to wait for the timeout. Looks like the process was killed. The problem, I think, is still with the different return codes from "kill" and "/bin/kill"

5
  • Replace >/dev/null at the first attempt to kill with >/tmp/mysql.log and the second in the while loop with >>/tmp/mysql.log and try again to see what the logs say. Also take a look at /var/log/mysqld.log.
    – quanta
    Commented Sep 17, 2011 at 16:03
  • Added the results above. I still believe something weird is happening with the kill and /bin/kill not returning the same exit codes. Commented Sep 17, 2011 at 16:17
  • The first attempt returns 0 because you had switched to normal user from root. Are you always get return status 0 when trying /bin/kill with normal user? What is the output when running in debug mode sh -x /etc/init.d/mysqld stop?
    – quanta
    Commented Sep 17, 2011 at 17:10
  • As you see from one of my edits above, running /bin/kill as a non-root user returns this error: (No info could be read for "-p": geteuid()=501 but you should be root.). But this does not happen in my other server when I run /bin/kill as a non-root user. Commented Sep 17, 2011 at 17:23
  • 1
    Bash has a builtin command kill. This is helpful in case the system is swamped enough that it can't spawn any more processes, then you can still kill rogue processes if you're logged in. Thus, when you run kill(as opposed to /bin/kill), then you're running the builtin. This should explain the difference between running kill and /bin/kill, where you call a specific binary.
    – Kevin M
    Commented Sep 17, 2011 at 23:34

1 Answer 1

2

First things first: a very well-done, systematic and thorough debugging, good job.

On my RHEL 5.6 box I always get a return code of 1 if I try to kill a non-existing pid. I tried as both root and a non-privileged user, both with full path and a with just the command name. I also get only terse kill XXX: No such process, with no elaborate error messages.

It may be a good idea to run rpm -Vv util-linux and see if somebody didn't replace /bin/kill with a new and improved version. Even if rpm verification says the file is pristine, I'd try renaming /bin/kill and copying over a binary from a working machine. If the file replacement helps and you don't uncover a legitimate the source of the change, then regardless of output of rpm verification I'd assume the machine was compromised.

3
  • Looks like something is up with /bin/kill:. S.5....T /bin/kill Commented Sep 18, 2011 at 3:53
  • Did a quick yum update util-linux and all is well now. You, sir, are a gentleman AND a scholar. Commented Sep 18, 2011 at 4:26
  • Thank you for your courteous words. I think rpm -aV may be in order. util-linux may be not the only affected package, and whoever changed the kill binary doesn't seem to manipulate rpm's base as well. Commented Sep 19, 2011 at 3:52

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .