8

First off, the details.

BEFORE: kernel: 3.2.0-2-amd64, nvidia driver: 295.59

AFTER: kernel: 3.2.0-3-amd64, nvidia driver: 302.17-3

My Debian wheezy is kept recent at all times. Actually, doing daily apt-get upgrade -s got me in this trouble in the first place.

Evidently, after an apt-get upgrade, something "broke" on my Debian -- something related to the build ecosystem and/or DKMS itself.

The NVIDIA driver cannot get build by ANY method recommended in the official Wikis. Including the NVIDIA official binary (log snippet from that at one of the updates).

Here's the output of dpkg-reconfigure nvidia-kernel-dkms:

# dpkg-reconfigure nvidia-kernel-dkms

------------------------------
Deleting module version: 302.17
completely from the DKMS tree.
------------------------------
Done.
Loading new nvidia-302.17 DKMS files...
Building only for 3.2.0-3-amd64
Building initial module for 3.2.0-3-amd64
Error!  Build of nvidia.ko failed for: 3.2.0-3-amd64 (x86_64)
Consult the make.log in the build directory
/var/lib/dkms/nvidia/302.17/build/ for more information.

A relevant snippet from /var/lib/dkms/nvidia/302.17/build/make.log follows. The problem is not in the compilation, I can guarantee that.

  LD [M]  /var/lib/dkms/nvidia/302.17/build/nvidia.o
  Building modules, stage 2.
  MODPOST 0 modules
make[1]: Leaving directory `/usr/src/linux-headers-3.2.0-3-amd64'
make: Leaving directory `/var/lib/dkms/nvidia/302.17/build'

And that's it. No explanation of any kind in any other files in the same directory (at least as far as I checked).

Before I ask my questions: I am using nouveau driver now (it's not like I got any choice anyway), but it doesn't work too well for me. I got 3 desktops, constantly playing movies on 1 of them, and being a very busy developer on the other 2. The nouveau driver fails a little bit there (the movies on the second screen get horizontal stripes all the time, the XFCE consoles lag a bit on the scrolling, etc.)

Questions:

  • Should I change my kernel version? Tried 3.2.0-2-amd64 and 3.2.0-3-amd64, to no avail. Trying 3.2.0-3-rt-amd64 makes my machine freeze after few minutes of operation, thus I don't dare to install it again.
  • Should I change a version of something in my build environment? (As pointed in the updates, it's not just NVIDIA problem, as it turns out).
  • Should I assume that my linker is at fault (I am not using gold, I am using ld from the binutils package) and if so, what could I do do make the DKMS method finally work? Since the problem does seem to manifest itself on the linkage phase (and MODPOST shows 0 modules).

On a personal note, this disturbs me on a lot deeper level I care to usually admit. I had a big respect to Debian, which at the moment is shattered. C'mon, a simple apt-get upgrade breaks all open-source kernel drivers compilations / linkages?

Extremely disappointing.

UPDATE #1:

I did in fact try to install the official 304.22 NVIDIA drivers, here's the log file. Looks like the linking does indeed fail, does it?

Also, if I try to also enable DKMS integration, I get a message of the sorts that the script cannot determine the current kernel version (text in the 3rd update).

nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Sat Jul 21 22:59:30 2012
installer version: 304.22

PATH: /usr/local/rvm/gems/ruby-1.9.3-p194/bin:/usr/local/rvm/gems/ruby-1.9.3-p194@global/bin:/usr/local/rvm/rubies/ruby-1.9.3-p194/bin:/usr/local/rvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

nvidia-installer command line:
    ./nvidia-installer

Using: nvidia-installer ncurses user interface
-> License accepted.
-> Installing NVIDIA driver version 304.22.
-> There appears to already be a driver installed on your system (version: 304.22).  As part of installing this driver (version: 304.22), the existing driver will be uninstalled.  Are you sure you want to continue? ('no' will abort installation) (Answer: Yes)
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. (Answer: No)
-> Performing CC sanity check with CC="gcc-4.6".
-> Performing CC version check with CC="gcc-4.6".
-> Kernel source path: '/lib/modules/3.2.0-3-amd64/source'
-> Kernel output path: '/lib/modules/3.2.0-3-amd64/build'
-> Performing rivafb check.
-> Performing nvidiafb check.
-> Performing Xen check.
-> Cleaning kernel module build directory.
   executing: 'cd ./kernel; make clean'...
-> Building kernel module:
   executing: 'cd ./kernel; make module SYSSRC=/lib/modules/3.2.0-3-amd64/source SYSOUT=/lib/modules/3.2.0-3-amd64/build'...
   NVIDIA: calling KBUILD...
   make -C /lib/modules/3.2.0-3-amd64/build \
    KBUILD_SRC=/usr/src/linux-headers-3.2.0-3-common \
    KBUILD_EXTMOD="/tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel" -f /usr/src/linux-headers-3.2.0-3-common/Makefile \
    modules
   test -e include/generated/autoconf.h -a -e include/config/auto.conf || (     \
    echo;                               \
    echo "  ERROR: Kernel configuration is invalid.";       \
    echo "         include/generated/autoconf.h or include/config/auto.conf are missing.";\
    echo "         Run 'make oldconfig && make prepare' on kernel src to fix it.";  \
    echo;                               \
    /bin/false)
   mkdir -p /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/.tmp_versions ; rm -f /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/.tmp_versions/*
   make -f /usr/src/linux-headers-3.2.0-3-common/scripts/Makefile.build obj=/tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel
     gcc-4.6 -Wp,-MD,/tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/.nv.o.d  -nostdinc -isystem /usr/lib/gcc/x86_64-linux-gnu/4.6/include -I/usr/src/linux-headers-3.2.0-3-common/arch/x86/include -Iarch/x86/include/generated -Iinclude  -I/usr/src/linux-headers-3.2.0-3-common/include -include /usr/src/linux-headers-3.2.0-3-common/include/linux/kconfig.h   -I/tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -Os -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -fstack-protector -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wframe-larger-than=2048 -Wno-unused-but-set-variable -fomit-frame-pointer -g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO   -I/tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel -Wall -MD -Wsign-compare -Wno-cast-
   qual -Wno-error -D__KERNEL__ -DMODULE -DNVRM -DNV_VERSION_STRING=\"304.22\" -Wno-unused-function -Wuninitialized -mno-red-zone -mcmodel=kernel -UDEBUG -U_DEBUG -DNDEBUG  -DMODULE  -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(nv)"  -D"KBUILD_MODNAME=KBUILD_STR(nvidia)" -c -o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/.tmp_nv.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv.c
   In file included from /usr/src/linux-headers-3.2.0-3-common/include/linux/kernel.h:17:0,
                    from /usr/src/linux-headers-3.2.0-3-common/include/linux/sched.h:55,
                    from /usr/src/linux-headers-3.2.0-3-common/include/linux/utsname.h:35,
                    from /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-linux.h:38,
                    from /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv.c:13:
   /usr/src/linux-headers-3.2.0-3-common/include/linux/bitops.h: In function ‘hweight_long’:
   /usr/src/linux-headers-3.2.0-3-common/include/linux/bitops.h:49:41: warning: signed and unsigned type in conditional expression [-Wsign-compare]
   In file included from /usr/src/linux-headers-3.2.0-3-common/arch/x86/include/asm/uaccess.h:575:0,
                    from /usr/src/linux-headers-3.2.0-3-common/include/linux/poll.h:14,
                    from /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-linux.h:97,
                    from /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv.c:13:
   /usr/src/linux-headers-3.2.0-3-common/arch/x86/include/asm/uaccess_64.h: In function ‘copy_from_user’:
   /usr/src/linux-headers-3.2.0-3-common/arch/x86/include/asm/uaccess_64.h:53:6: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

...snipped lots of compile output with the same warning...

     ld -m elf_x86_64   -r -o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nvidia.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-kernel.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-acpi.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-chrdev.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-cray.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-gvi.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-i2c.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-mempool.o /tmp/selfgz10141/NVI
   DIA-Linux-x86_64-304.22/kernel/nv-mlock.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-mmap.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-p2p.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-pat.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-procfs.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-usermap.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-vm.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-vtophys.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/os-agp.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/os-interface.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/os-mtrr.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/os-registry.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/os-smp.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/os-usermap.o 
   (cat /dev/null;   echo kernel//tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nvidia.ko;) > /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/modules.order
   make -f /usr/src/linux-headers-3.2.0-3-common/scripts/Makefile.modpost
     scripts/mod/modpost -m  -i /usr/src/linux-headers-3.2.0-3-amd64/Module.symvers -I /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/Module.symvers  -o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/Module.symvers -S -w  -s
   NVIDIA: left KBUILD.
   nvidia.ko failed to build!
   make[1]: *** [module] Error 1
   make: *** [module] Error 2
-> Error.
ERROR: Unable to build the NVIDIA kernel module.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

UPDATE #2:

As per the suggestion of StarNamer, I did reinstall linux-headers-3.2.0-3-amd64. After that was done, DKMS kicked in and tried again to compile the NVIDIA driver. Here's the contents of the file /var/lib/dkms/nvidia/304.22/build/make.log:

DKMS make.log for nvidia-304.22 for kernel 3.2.0-3-amd64 (x86_64)
Sun Jul 22 14:50:58 EEST 2012
If you are using a Linux 2.4 kernel, please make sure
you either have configured kernel sources matching your
kernel or the correct set of kernel headers installed
on your system.

If you are using a Linux 2.6 kernel, please make sure
you have configured kernel sources matching your kernel
installed on your system. If you specified a separate
output directory using either the "KBUILD_OUTPUT" or
the "O" KBUILD parameter, make sure to specify this
directory with the SYSOUT environment variable or with
the equivalent nvidia-installer command line option.

Depending on where and how the kernel sources (or the
kernel headers) were installed, you may need to specify
their location with the SYSSRC environment variable or
the equivalent nvidia-installer command line option.

*** Unable to determine the target kernel version. ***

make: *** [select_makefile] Error 1

UPDATE #3:

After days and days of googling, I started to wonder if that's NVIDIA's fault at all. Turns out, it's not. I tried to install Virtual Box 4.1 (from the testing repo), and I stumbled upon this again:

# cat /var/lib/dkms/virtualbox/4.1.18/build/make.log 
DKMS make.log for virtualbox-4.1.18 for kernel 3.2.0-3-amd64 (x86_64)
Tue Jul 24 17:58:57 EEST 2012
make: Entering directory `/usr/src/linux-headers-3.2.0-3-amd64'
  LD      /var/lib/dkms/virtualbox/4.1.18/build/built-in.o
  LD      /var/lib/dkms/virtualbox/4.1.18/build/vboxdrv/built-in.o
  CC [M]  /var/lib/dkms/virtualbox/4.1.18/build/vboxdrv/linux/SUPDrv-linux.o
... snipped ...
  CC [M]  /var/lib/dkms/virtualbox/4.1.18/build/vboxpci/SUPR0IdcClientComponent.o
  CC [M]  /var/lib/dkms/virtualbox/4.1.18/build/vboxpci/linux/SUPR0IdcClient-linux.o
  LD [M]  /var/lib/dkms/virtualbox/4.1.18/build/vboxpci/vboxpci.o
  Building modules, stage 2.
  MODPOST 0 modules
make: Leaving directory `/usr/src/linux-headers-3.2.0-3-amd64'

And of course, no more details (as already have been said, it does seem like a linker problem, but I cannot be sure yet). So this must be more of a Debian / DKMS problem or misconfiguration of some kind. However, I swear I didn't touch anything. I was simply doing daily apt-get upgrade-s. Then something went not so well, obviously.

UPDATE #4:

I did try create a small module as described here: https://stackoverflow.com/questions/4715259/linux-modpost-does-not-build-anything. Indeed I am still seeing MODPOST 0 modules. Here's the output when I put V=1 in the Makefile:

# make
make -C /lib/modules/3.2.0-3-amd64/build M=/home/dimi/code/hello V=1 modules
make[1]: Entering directory `/usr/src/linux-headers-3.2.0-3-amd64'
make -C /usr/src/linux-headers-3.2.0-3-amd64 \
    KBUILD_SRC=/usr/src/linux-headers-3.2.0-3-common \
    KBUILD_EXTMOD="/home/dimi/code/hello" -f /usr/src/linux-headers-3.2.0-3-common/Makefile \
    modules
test -e include/generated/autoconf.h -a -e include/config/auto.conf || (        \
    echo;                               \
    echo "  ERROR: Kernel configuration is invalid.";       \
    echo "         include/generated/autoconf.h or include/config/auto.conf are missing.";\
    echo "         Run 'make oldconfig && make prepare' on kernel src to fix it.";  \
    echo;                               \
    /bin/false)
mkdir -p /home/dimi/code/hello/.tmp_versions ; rm -f /home/dimi/code/hello/.tmp_versions/*
make -f /usr/src/linux-headers-3.2.0-3-common/scripts/Makefile.build obj=/home/dimi/code/hello
   gcc-4.6 -Wp,-MD,/home/dimi/code/hello/.hello.o.d  -nostdinc -isystem /usr/lib/gcc/x86_64-linux-gnu/4.6/include -I/usr/src/linux-headers-3.2.0-3-common/arch/x86/include -Iarch/x86/include/generated -Iinclude  -I/usr/src/linux-headers-3.2.0-3-common/include -include /usr/src/linux-headers-3.2.0-3-common/include/linux/kconfig.h   -I/home/dimi/code/hello -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -Os -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -fstack-protector -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wframe-larger-than=2048 -Wno-unused-but-set-variable -fomit-frame-pointer -g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO  -DMODULE  -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(hello)"  -D"KBUILD_MODNAME=KBUILD_STR(hello)" -c -o /home/dimi/code/hello/.tmp_hello.o /home/dimi/code/hello/hello.c
(cat /dev/null;   echo kernel//home/dimi/code/hello/hello.ko;) > /home/dimi/code/hello/modules.order
make -f /usr/src/linux-headers-3.2.0-3-common/scripts/Makefile.modpost
  scripts/mod/modpost -m  -i /usr/src/linux-headers-3.2.0-3-amd64/Module.symvers -I /home/dimi/code/hello/Module.symvers  -o /home/dimi/code/hello/Module.symvers -S -w -c -s
make[1]: Leaving directory `/usr/src/linux-headers-3.2.0-3-amd64'

And here is what I see when I remove V=1:

# make
make -C /lib/modules/3.2.0-3-amd64/build M=/home/dimi/code/hello modules
make[1]: Entering directory `/usr/src/linux-headers-3.2.0-3-amd64'
  CC [M]  /home/dimi/code/hello/hello.o
  Building modules, stage 2.
  MODPOST 0 modules
make[1]: Leaving directory `/usr/src/linux-headers-3.2.0-3-amd64'
14
  • Updated the question with a log file.
    – dimitarvp
    Commented Jul 21, 2012 at 21:39
  • 1
    The error message says include/generated/autoconf.h or include/config/auto.conf is missing. On my system, these are in /usr/src/linix-headers-3.2.0-3-686-pae/. Have you checked these are in place (your location will presumably be slightly differnt) or tried re-installing the linux header files?
    – StarNamer
    Commented Jul 22, 2012 at 0:01
  • I did. The new problem is included as UPDATE #2. Thank you for being helpful.
    – dimitarvp
    Commented Jul 22, 2012 at 11:51
  • 1
    I agree with the edit that this looks like a DKMS problem. I am not sure if it's specific to NVIDA - I've never had a problem with VirtualBox's DKMS setup. I think you'd probably be OK with the 304.22 driver if you didn't request it to add the DKMS functionality.
    – StarNamer
    Commented Jul 24, 2012 at 15:52
  • 1
    I just looked more closely at your VirtualBox make.log and I can't actually see an error there, apart from the fact it built zero modules! One of the references I found to this was here (stackoverflow.com/questions/4715259/…) where it was suggested that the kernel configuration was broken, but it was not resolved; the problem just 'went away'. I know you reinstalled linux-headers, but it might be worth trying to purge this package (and perhaps dkms and virtualbox-dkms) and then reinstall tehm all.
    – StarNamer
    Commented Jul 24, 2012 at 16:08

4 Answers 4

7

SOLVED!

Simple as that: /root/.bashrc had this inside:

 export GREP_OPTIONS='--color=always'

Changed it to:

 export GREP_OPTIONS='--color=never'

...and restarted the root shell (of course; do not omit this step). Everything started working again. Both NVIDIA and VirtualBox kernel modules built from the first try. I am so happy! :-)

Then again though, I am slighly disappointed by the kernel build tools. They should know better and pass --color=never everywhere they use grep; or rather, store the old value of GREP_OPTIONS, override it for the lifetime of the building process, then restore it.

I am hopeful that my epic one-week battle with this problem will prove valuable both to the community and the kernel build tools developers.

A very warm thanks to the people who were with me and tried to help.

(All credits go here: http://forums.gentoo.org/viewtopic-p-4156366.html#4156366)

4
  • 6
    Congratulations on finding the solution. I'd think that qualifies as a bug in the kernel build tools as it really should not break just because someone likes color output from grep!
    – StarNamer
    Commented Jul 28, 2012 at 19:28
  • 1
    Nice find. Frustrating and annoying, too. I think it's the kind of thing that never gets tested, though, because most people who want colour ouput use --colour=auto so that ESC codes don't pollute the output when it's being piped or redirected. On the few occasions you might want colourised output piped (e.g. when piping to less -r) you can specify --colour on the command line.
    – cas
    Commented Jul 28, 2012 at 23:13
  • Only trouble here, I got no idea whatsoever who must I show this page, so that the Linux kernel build tools team notices and maybe does something about it.
    – dimitarvp
    Commented Jul 29, 2012 at 13:26
  • @dimitko, you should try the kernel bugzilla.
    – vonbrand
    Commented Mar 24, 2014 at 2:03
3
+50

Have you tried purging and reinstalling dkms?

You could use apt-get purge dkms and that will also purge all the packages that depend on it, so you'll have to reinstall them afterwards.

If you don't want the dependant packages purged too, you could use dpkg:

dpkg --purge --force-depends dkms

reinstall with the usual: apt-get install dkms

FWIW, I have two machines here (running debian sid) with kernel linux-image-3.2.0-3-amd64 and the nvidia-kernel-dkms 302.17-3 and related packages installed. The dkms module compiled without a problem. A third machine (my main desktop) is still running nvidia-kernel-dkms 295.53-1, mostly because I don't want to have to logout.

BTW, you mentioned purging and re-installing the various nvidia packages with aptitude. There are several nvidia packages that don't have nvidia in the package name. Here's the solution I came up with to hold/unhold nvidia pkgs (I usually only want to upgrade nvidia pkgs when I'm willing/able to log out of my current X session...and after a few unpleasant surprises with new nvidia versions, I like to test them on my least-important machine first):

(note: you'll need my dlocate package installed to run this)

$ cat /usr/local/sbin/hold-nvidia.sh 
#! /bin/bash

PKGS=$(dlocate -l nvidia cuda vdpau | awk '/^[hi]i/ {print $2}' | sed -e 's/:.*//')

echo dpkg-hold $PKGS
dpkg-hold $PKGS

There's a nearly identical one for un-holding packages (runs dpkg-unhold instead of dpkg-hold), and it would be trivial to make it run dpkg --purge or apt-get purge instead.

14
  • Thank you. Didn't try to purge dkms itself yet -- I am gonna do it after I am done working for the day. I got another question for you though -- did you read my edits on trying to create a "hello world" module? The part where no .ko file ever gets created? Do you got any idea on how should I remedy this? (modpost is inside the linux-kbuild-3.2 package btw.)
    – dimitarvp
    Commented Jul 25, 2012 at 23:14
  • 1
    sorry, kind of skipped over it in a tl;dr fashion :). I did notice just now in UPDATE #1 that you have several non-standard directories in your PATH...is there any chance there's a conflicting binary/script in one of those dirs? have you tried setting a "standard" PATH (just the usual system dirs) before building the dkms module?
    – cas
    Commented Jul 25, 2012 at 23:24
  • You could be right. I got RVM, and it's infamous for overriding important things in one's environment. Just finished my 18-hour freelancing "working day", gonna crash now, but both your suggestions will be tried, first thing when I get up. Thanks.
    – dimitarvp
    Commented Jul 26, 2012 at 2:38
  • Sadly, no joy. Removed RVM from my .bashrc files (both the root and the regular user), restarted the console, then purged dkms (which naturally purged nvidia-kernel-dkms and virtualbox-dkms). Then I did apt-get clean (even though debsubs didn't report problems), then I did reinstall them. Nothing changed. Can we try to solve the issue in Update #4 btw? (I believe it's the root of the problem.) I got a tiny "hello world" module, and even it doesn't get a .ko file built. You got any idea on this?
    – dimitarvp
    Commented Jul 26, 2012 at 11:52
  • 1
    sorry, ignore that. i read it wrong. the make from experimental causes the problem. reverting fixes it.
    – cas
    Commented Jul 26, 2012 at 22:16
2

This is no good:

PATH: /usr/local/rvm/gems/ruby-1.9.3-p194/bin:\
    /usr/local/rvm/gems/ruby-1.9.3-p194@global/bin:\
    /usr/local/rvm/rubies/ruby-1.9.3-p194/bin:\
    /usr/local/rvm/bin:\
    /usr/local/sbin:\
    /usr/local/bin:\
    /usr/sbin:\
    /usr/bin:\
    /sbin:\
    /bin

I am willing to bet that in at least a few of those directories you had $PATHed ahead of /bin and /sbin - especially the ruby ones - you had common shell app wrapper scripts for colorizing output. Maybe you even had similar configuration applied in /etc/skel in which case not even /bin/env -i grep could have saved you from yourself.

This is why people compile in chroot.

P.S. I'm only so critical because I had to learn the same lesson the same way a couple years back. You probably would not have needed the =never if your $PATH was clean. Also, you can just use --color=auto in which case the terminal escapes are only used if grep'sstdout is a terminal - in other words - not in a |pipe to gcc.

Or, even better, instead of setting an inflexible shell alias with:

alias grep=grep\ --color=anything

You could take advantage of grep's$ENV setting:

GREP_COLOR=auto
3
  • 1
    I appreciate the feedback and the advice. While the original problem was that I was forcing grep to always colorize its output, I am tkaing a note to compile in chroot from here on, indeed.
    – dimitarvp
    Commented Mar 24, 2014 at 20:15
  • @dimitko Yeah. It was a little on the jerk side, though, maybe. I think I may have tempered it a bit in postscript and added a little more info.
    – mikeserv
    Commented Mar 25, 2014 at 3:09
  • I agree, and don't worry about it, I didn't take offense. ;) I felt incredibly stupid after discovering the "solution". Even if I don't use the GREP_COLOR environment variable, I believe grep assumes auto-colorizing by default anyway. So in the end, I just shouldn't have meddled with it. :D Anyway though, and again, experimenting in chroot definitely works better, can't deny it.
    – dimitarvp
    Commented Mar 25, 2014 at 11:10
1

I am running Debian Wheezy (32-bit) with NVIDIA drivers. I also recently tried the DKMS 320.17 version, but reverted to the 'Official' NVIDIA build (295.59), not because it failed to install, but because it doesn't include the nvidia-settings program and I need to reset the overscan on my HD TV (secondary monitor).

Having said that, you shouldn't need to link /usr/bin/gcc to gcc-4.6 to run the older version. I just executed CC=gcc-4.6 before running the 'Official' 295.59 installation after I purged all the nvidia stuff that apt-get had installed.

Wheezy is still the testing version, so it's possible you've hit a bug that hasn't been properly tested in the 64-bit build. As I said, my 32-bit upgrade showed no problem apart from missing functionality. You might want to look at logging a bug report if you are sure everything's OK on your system.

My recommendation would be to revert to the 295.59 'Official' version (either using your link or defining CC to use the right version of gcc) and then to wait for an update to the nvidia-kernel-dkms module (or until Wheezy is release as stable).

Of course, if you check here you'll see that the code is Non-free anyway, so it's probably using a pre-release version of the 'Official' binary which would simply add to the possibilities of problems.

3
  • I tried downgrading. Sadly absolutely the same problem appeared. I forgot to include that in the original post, sorry about that. But after purging everything that apt-get installed (and I did log that and then properly removed it), and issued the apt-get install (with the 295.59 version), I hit exactly the same problem. It's why I said "evidently something broke", as unclear as this sounds.
    – dimitarvp
    Commented Jul 20, 2012 at 19:24
  • 1
    I actually downloaded the file NVIDIA-Linux-x86-295.59.run from the NVIDIA website (there's probably an Amd64 version) and built it that way.This is what I assumed you meant by the 'official' version. I also went through all installed packages with nvidia in the name (using aptitude) and purged everything to avoid problems. I'm also running kernel 3.2.0-3 so don't think a kernel downgrade would help.
    – StarNamer
    Commented Jul 21, 2012 at 0:26
  • Well, that would be my last resort, to be fair. I hate re-running such procedures every time I get an update to the kernel. That's what DKMS is for. Anyway, I believe my case is some vague linker problem, which I hoped this site can help with. :-(
    – dimitarvp
    Commented Jul 21, 2012 at 10:35

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .