0

I have intel integrated graphics and nvidia 1650 mobile graphics card. I am loading my xorg server only using the integrated GPU. The NVIDIA GPU loads which nvidia driver on boot. I have libvirt hooks defined to dynamically bind the NVIDIA Card to VFIO drivers on VM startup and unbind the card after VM stop.

Guide I followed : Bryan Steiner GitHub

Kernel logs before host freezing :

Jun 17 22:37:50 ash kernel: [ 8100.874501] VFIO - User Level meta-driver version: 0.3
Jun 17 22:37:50 ash kernel: VFIO - User Level meta-driver version: 0.3
Jun 17 22:37:50 ash kernel: vfio-pci 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=io+mem:owns=none
Jun 17 22:37:50 ash kernel: [ 8100.885111] vfio-pci 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=io+mem:owns=none
Jun 17 22:37:52 ash kernel: [ 8102.336272] virbr0: port 1(vnet2) entered blocking state
Jun 17 22:37:52 ash kernel: [ 8102.336279] virbr0: port 1(vnet2) entered disabled state
Jun 17 22:37:52 ash kernel: [ 8102.336284] vnet2: entered allmulticast mode
Jun 17 22:37:52 ash kernel: [ 8102.336338] vnet2: entered promiscuous mode
Jun 17 22:37:52 ash kernel: [ 8102.336540] virbr0: port 1(vnet2) entered blocking state
Jun 17 22:37:52 ash kernel: [ 8102.336542] virbr0: port 1(vnet2) entered listening state
Jun 17 22:37:52 ash kernel: [ 8102.336284] vnet2: entered allmulticast mode
Jun 17 22:37:52 ash kernel: [ 8102.336338] vnet2: entered promiscuous mode
Jun 17 22:37:52 ash kernel: [ 8102.336540] virbr0: port 1(vnet2) entered blocking state
Jun 17 22:37:52 ash kernel: [ 8102.336542] virbr0: port 1(vnet2) entered listening state
Jun 17 22:37:52 ash kernel: virbr0: port 1(vnet2) entered blocking state
Jun 17 22:37:52 ash kernel: virbr0: port 1(vnet2) entered disabled state
Jun 17 22:37:52 ash kernel: vnet2: entered allmulticast mode
Jun 17 22:37:52 ash kernel: vnet2: entered promiscuous mode
Jun 17 22:37:52 ash kernel: virbr0: port 1(vnet2) entered blocking state
Jun 17 22:37:52 ash kernel: virbr0: port 1(vnet2) entered listening state
Jun 17 22:37:52 ash kernel: vfio_iommu_type1_attach_group: No interrupt remapping support.  Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
Jun 17 22:37:52 ash kernel: [ 8102.399736] vfio_iommu_type1_attach_group: No interrupt remapping support.  Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
Jun 17 22:37:52 ash kernel: [ 8102.411654] virbr0: port 1(vnet2) entered disabled state
Jun 17 22:37:52 ash kernel: [ 8102.411781] vnet2 (unregistering): left allmulticast mode
Jun 17 22:37:52 ash kernel: [ 8102.411785] vnet2 (unregistering): left promiscuous mode
Jun 17 22:37:52 ash kernel: [ 8102.411786] virbr0: port 1(vnet2) entered disabled state
Jun 17 22:37:52 ash kernel: virbr0: port 1(vnet2) entered disabled state
Jun 17 22:37:52 ash kernel: vnet2 (unregistering): left allmulticast mode
Jun 17 22:37:52 ash kernel: vnet2 (unregistering): left promiscuous mode
Jun 17 22:37:52 ash kernel: virbr0: port 1(vnet2) entered disabled state
Jun 17 22:37:55 ash kernel: vfio-pci 0000:01:00.0: not ready 1023ms after resume; waiting
Jun 17 22:37:55 ash kernel: [ 8105.581639] vfio-pci 0000:01:00.0: not ready 1023ms after resume; waiting
Jun 17 22:37:56 ash kernel: [ 8106.669406] vfio-pci 0000:01:00.0: not ready 2047ms after resume; waiting
Jun 17 22:37:56 ash kernel: vfio-pci 0000:01:00.0: not ready 2047ms after resume; waiting
Jun 17 22:37:58 ash kernel: vfio-pci 0000:01:00.0: not ready 4095ms after resume; waiting
Jun 17 22:37:58 ash kernel: [ 8108.781711] vfio-pci 0000:01:00.0: not ready 4095ms after resume; waiting
Jun 17 22:38:03 ash kernel: [ 8113.261646] vfio-pci 0000:01:00.0: not ready 8191ms after resume; waiting
Jun 17 22:38:03 ash kernel: vfio-pci 0000:01:00.0: not ready 8191ms after resume; waiting
Jun 17 22:38:11 ash kernel: vfio-pci 0000:01:00.0: not ready 16383ms after resume; waiting
Jun 17 22:38:11 ash kernel: [ 8121.965484] vfio-pci 0000:01:00.0: not ready 16383ms after resume; waiting
Jun 17 22:38:28 ash kernel: [ 8138.861452] vfio-pci 0000:01:00.0: not ready 32767ms after resume; waiting
Jun 17 22:38:28 ash kernel: vfio-pci 0000:01:00.0: not ready 32767ms after resume; waiting

Kernel info:

Linux ash 6.9.4-gentoo #4 SMP PREEMPT_DYNAMIC Sun Jun 16 12:02:50 IST 2024 x86_64 Intel(R) Core(TM) i5-10300H CPU @ 2.50GHz GenuineIntel GNU/Linux

IOMMU enabled :

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.9.4-gentoo root=PARTUUID=4a5f5d65-81be-e24d-bedc-aeb2f0b24295 ro i8042.nopnp mmio_stale_data=full libata.noacpi=1 resume=PARTUUID=c302ea25-d8e2-064e-8ebd-c79276fad2cc drm.edid_firmware=edid/1920x1080_BOE08E8.bin i915.modeset=1 iommu=pt intel_iommu=on
[    0.001382] ACPI: DMAR 0x000000009BC4C000 000098 (v01 LENOVO CB-01
00000001      01000013)
[    0.001417] ACPI: Reserving DMAR table memory at [mem 0x9bc4c000-0x9bc4c097]
[    0.056165] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.9.4-gentoo root=PARTUUID=4a5f5d65-81be-e24d-bedc-aeb2f0b24295 ro i8042.nopnp mmio_stale_data=full libata.noacpi=1 resume=PARTUUID=c302ea25-d8e2-064e-8ebd-c79276fad2cc drm.edid_firmware=edid/1920x1080_BOE08E8.bin i915.modeset=1 iommu=pt intel_iommu=on
[    0.056247] DMAR: IOMMU enabled
[    0.114988] DMAR: Host address width 39
[    0.114991] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.114999] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e
[    0.115004] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.115009] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[    0.115014] DMAR: RMRR base: 0x0000009b145000 end: 0x0000009b164fff
[    0.115018] DMAR: RMRR base: 0x0000009d000000 end: 0x0000009f7fffff
[    0.287373] iommu: Default domain type: Passthrough (set via kernel
command line)
[    0.387530] DMAR: No ATSR found
[    0.387533] DMAR: No SATC found
[    0.387536] DMAR: IOMMU feature fl1gp_support inconsistent
[    0.387537] DMAR: IOMMU feature pgsel_inv inconsistent
[    0.387541] DMAR: IOMMU feature nwfs inconsistent
[    0.387544] DMAR: IOMMU feature pasid inconsistent
[    0.387547] DMAR: IOMMU feature eafs inconsistent
[    0.387551] DMAR: IOMMU feature prs inconsistent
[    0.387553] DMAR: IOMMU feature nest inconsistent
[    0.387554] DMAR: IOMMU feature mts inconsistent
[    0.387560] DMAR: IOMMU feature sc_support inconsistent
[    0.387563] DMAR: IOMMU feature dev_iotlb_support inconsistent
[    0.387568] DMAR: dmar0: Using Queued invalidation
[    0.387578] DMAR: dmar1: Using Queued invalidation
[    0.387816] pci 0000:00:02.0: Adding to iommu group 0
[    0.387864] pci 0000:00:00.0: Adding to iommu group 1
[    0.387885] pci 0000:00:01.0: Adding to iommu group 2
[    0.387900] pci 0000:00:04.0: Adding to iommu group 3
[    0.387930] pci 0000:00:08.0: Adding to iommu group 4
[    0.387951] pci 0000:00:12.0: Adding to iommu group 5
[    0.387975] pci 0000:00:14.0: Adding to iommu group 6
[    0.387989] pci 0000:00:14.2: Adding to iommu group 6
[    0.388003] pci 0000:00:14.3: Adding to iommu group 7
[    0.388028] pci 0000:00:15.0: Adding to iommu group 8
[    0.388042] pci 0000:00:15.1: Adding to iommu group 8
[    0.388061] pci 0000:00:16.0: Adding to iommu group 9
[    0.388075] pci 0000:00:17.0: Adding to iommu group 10
[    0.388100] pci 0000:00:1d.0: Adding to iommu group 11
[    0.388126] pci 0000:00:1d.6: Adding to iommu group 12
[    0.388162] pci 0000:00:1f.0: Adding to iommu group 13
[    0.388177] pci 0000:00:1f.3: Adding to iommu group 13
[    0.388193] pci 0000:00:1f.4: Adding to iommu group 13
[    0.388207] pci 0000:00:1f.5: Adding to iommu group 13
[    0.388216] pci 0000:01:00.0: Adding to iommu group 2
[    0.388223] pci 0000:01:00.1: Adding to iommu group 2
[    0.388246] pci 0000:06:00.0: Adding to iommu group 14
[    0.388269] pci 0000:07:00.0: Adding to iommu group 15
[    0.388383] DMAR: Intel(R) Virtualization Technology for Directed I/O

IOMMU group

IOMMU Group 2 00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 02)

IOMMU Group 2 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU117M [GeForce GTX 1650 Mobile / Max-Q] [10de:1f99] (rev a1)

IOMMU Group 2 01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10fa] (rev a1)

libvirt hooks tree:

./
├── kvm.conf
├── qemu
└── qemu.d
    └── win10
        ├── prepare
        │   └── begin
        │       └── bind_vfio.sh
        └── release
            └── end
                └── unbind_vfio.sh

kvm.conf

VIRSH_GPU_VIDEO=pci_0000_01_00_0
VIRSH_GPU_AUDIO=pci_0000_01_00_1

bind_vfio.sh

    #!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Load vfio
modprobe vfio
modprobe vfio_iommu_type1
modprobe vfio_pci

## Unbind gpu from nvidia and bind to vfio
virsh nodedev-detach $VIRSH_GPU_VIDEO
virsh nodedev-detach $VIRSH_GPU_AUDIO

unbinf_vfio.sh

    #!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Unbind gpu from vfio and bind to nvidia
virsh nodedev-reattach $VIRSH_GPU_VIDEO
virsh nodedev-reattach $VIRSH_GPU_AUDIO

## Unload vfio
modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio

qemu script to add per guest qemu hooks

#!/usr/bin/env bash
#
# Author: SharkWipf
#
# Copy this file to /etc/libvirt/hooks, make sure it's called "qemu".
# After this file is installed, restart libvirt.
# From now on, you can easily add per-guest qemu hooks.
# Add your hooks in /etc/libvirt/hooks/qemu.d/vm_name/hook_name/state_name.
# For a list of available hooks, please refer to https://www.libvirt.org/hooks.html
#

GUEST_NAME="$1"
HOOK_NAME="$2"
STATE_NAME="$3"
MISC="${@:4}"

BASEDIR="$(dirname $0)"

HOOKPATH="$BASEDIR/qemu.d/$GUEST_NAME/$HOOK_NAME/$STATE_NAME"

set -e # If a script exits with an error, we should as well.

# check if it's a non-empty executable file
if [ -f "$HOOKPATH" ] && [ -s "$HOOKPATH" ] && [ -x "$HOOKPATH" ]; then
    eval \"$HOOKPATH\" "$@"
elif [ -d "$HOOKPATH" ]; then
    while read file; do
        # check for null string
        if [ ! -z "$file" ]; then
        eval \"$file\" "$@"
        fi
    done <<< "$(find -L "$HOOKPATH" -maxdepth 1 -type f -executable -print;)"
fi

QEMU xml config is similar to one followed in the guide except I am only doing pci passthrough for GPU and audio device as mentioned above and I am not using nvme passthrough as shown in the guide.

Also, the host machine is a laptop. The guide mentioned having multiple monitors. Could this be the main issue why the host is freezing? The kernel logs indicate waiting for vfio-pci on PCI address corresponding to NVIDIA GPU.

0

You must log in to answer this question.

Browse other questions tagged .