I have intel integrated graphics and nvidia 1650 mobile graphics card. I am loading my xorg server only using the integrated GPU. The NVIDIA GPU loads which nvidia driver on boot. I have libvirt hooks defined to dynamically bind the NVIDIA Card to VFIO drivers on VM startup and unbind the card after VM stop.
Guide I followed : Bryan Steiner GitHub
Kernel logs before host freezing :
Jun 17 22:37:50 ash kernel: [ 8100.874501] VFIO - User Level meta-driver version: 0.3
Jun 17 22:37:50 ash kernel: VFIO - User Level meta-driver version: 0.3
Jun 17 22:37:50 ash kernel: vfio-pci 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=io+mem:owns=none
Jun 17 22:37:50 ash kernel: [ 8100.885111] vfio-pci 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=io+mem:owns=none
Jun 17 22:37:52 ash kernel: [ 8102.336272] virbr0: port 1(vnet2) entered blocking state
Jun 17 22:37:52 ash kernel: [ 8102.336279] virbr0: port 1(vnet2) entered disabled state
Jun 17 22:37:52 ash kernel: [ 8102.336284] vnet2: entered allmulticast mode
Jun 17 22:37:52 ash kernel: [ 8102.336338] vnet2: entered promiscuous mode
Jun 17 22:37:52 ash kernel: [ 8102.336540] virbr0: port 1(vnet2) entered blocking state
Jun 17 22:37:52 ash kernel: [ 8102.336542] virbr0: port 1(vnet2) entered listening state
Jun 17 22:37:52 ash kernel: [ 8102.336284] vnet2: entered allmulticast mode
Jun 17 22:37:52 ash kernel: [ 8102.336338] vnet2: entered promiscuous mode
Jun 17 22:37:52 ash kernel: [ 8102.336540] virbr0: port 1(vnet2) entered blocking state
Jun 17 22:37:52 ash kernel: [ 8102.336542] virbr0: port 1(vnet2) entered listening state
Jun 17 22:37:52 ash kernel: virbr0: port 1(vnet2) entered blocking state
Jun 17 22:37:52 ash kernel: virbr0: port 1(vnet2) entered disabled state
Jun 17 22:37:52 ash kernel: vnet2: entered allmulticast mode
Jun 17 22:37:52 ash kernel: vnet2: entered promiscuous mode
Jun 17 22:37:52 ash kernel: virbr0: port 1(vnet2) entered blocking state
Jun 17 22:37:52 ash kernel: virbr0: port 1(vnet2) entered listening state
Jun 17 22:37:52 ash kernel: vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
Jun 17 22:37:52 ash kernel: [ 8102.399736] vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
Jun 17 22:37:52 ash kernel: [ 8102.411654] virbr0: port 1(vnet2) entered disabled state
Jun 17 22:37:52 ash kernel: [ 8102.411781] vnet2 (unregistering): left allmulticast mode
Jun 17 22:37:52 ash kernel: [ 8102.411785] vnet2 (unregistering): left promiscuous mode
Jun 17 22:37:52 ash kernel: [ 8102.411786] virbr0: port 1(vnet2) entered disabled state
Jun 17 22:37:52 ash kernel: virbr0: port 1(vnet2) entered disabled state
Jun 17 22:37:52 ash kernel: vnet2 (unregistering): left allmulticast mode
Jun 17 22:37:52 ash kernel: vnet2 (unregistering): left promiscuous mode
Jun 17 22:37:52 ash kernel: virbr0: port 1(vnet2) entered disabled state
Jun 17 22:37:55 ash kernel: vfio-pci 0000:01:00.0: not ready 1023ms after resume; waiting
Jun 17 22:37:55 ash kernel: [ 8105.581639] vfio-pci 0000:01:00.0: not ready 1023ms after resume; waiting
Jun 17 22:37:56 ash kernel: [ 8106.669406] vfio-pci 0000:01:00.0: not ready 2047ms after resume; waiting
Jun 17 22:37:56 ash kernel: vfio-pci 0000:01:00.0: not ready 2047ms after resume; waiting
Jun 17 22:37:58 ash kernel: vfio-pci 0000:01:00.0: not ready 4095ms after resume; waiting
Jun 17 22:37:58 ash kernel: [ 8108.781711] vfio-pci 0000:01:00.0: not ready 4095ms after resume; waiting
Jun 17 22:38:03 ash kernel: [ 8113.261646] vfio-pci 0000:01:00.0: not ready 8191ms after resume; waiting
Jun 17 22:38:03 ash kernel: vfio-pci 0000:01:00.0: not ready 8191ms after resume; waiting
Jun 17 22:38:11 ash kernel: vfio-pci 0000:01:00.0: not ready 16383ms after resume; waiting
Jun 17 22:38:11 ash kernel: [ 8121.965484] vfio-pci 0000:01:00.0: not ready 16383ms after resume; waiting
Jun 17 22:38:28 ash kernel: [ 8138.861452] vfio-pci 0000:01:00.0: not ready 32767ms after resume; waiting
Jun 17 22:38:28 ash kernel: vfio-pci 0000:01:00.0: not ready 32767ms after resume; waiting
Kernel info:
Linux ash 6.9.4-gentoo #4 SMP PREEMPT_DYNAMIC Sun Jun 16 12:02:50 IST 2024 x86_64 Intel(R) Core(TM) i5-10300H CPU @ 2.50GHz GenuineIntel GNU/Linux
IOMMU enabled :
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.9.4-gentoo root=PARTUUID=4a5f5d65-81be-e24d-bedc-aeb2f0b24295 ro i8042.nopnp mmio_stale_data=full libata.noacpi=1 resume=PARTUUID=c302ea25-d8e2-064e-8ebd-c79276fad2cc drm.edid_firmware=edid/1920x1080_BOE08E8.bin i915.modeset=1 iommu=pt intel_iommu=on
[ 0.001382] ACPI: DMAR 0x000000009BC4C000 000098 (v01 LENOVO CB-01
00000001 01000013)
[ 0.001417] ACPI: Reserving DMAR table memory at [mem 0x9bc4c000-0x9bc4c097]
[ 0.056165] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.9.4-gentoo root=PARTUUID=4a5f5d65-81be-e24d-bedc-aeb2f0b24295 ro i8042.nopnp mmio_stale_data=full libata.noacpi=1 resume=PARTUUID=c302ea25-d8e2-064e-8ebd-c79276fad2cc drm.edid_firmware=edid/1920x1080_BOE08E8.bin i915.modeset=1 iommu=pt intel_iommu=on
[ 0.056247] DMAR: IOMMU enabled
[ 0.114988] DMAR: Host address width 39
[ 0.114991] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.114999] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e
[ 0.115004] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.115009] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[ 0.115014] DMAR: RMRR base: 0x0000009b145000 end: 0x0000009b164fff
[ 0.115018] DMAR: RMRR base: 0x0000009d000000 end: 0x0000009f7fffff
[ 0.287373] iommu: Default domain type: Passthrough (set via kernel
command line)
[ 0.387530] DMAR: No ATSR found
[ 0.387533] DMAR: No SATC found
[ 0.387536] DMAR: IOMMU feature fl1gp_support inconsistent
[ 0.387537] DMAR: IOMMU feature pgsel_inv inconsistent
[ 0.387541] DMAR: IOMMU feature nwfs inconsistent
[ 0.387544] DMAR: IOMMU feature pasid inconsistent
[ 0.387547] DMAR: IOMMU feature eafs inconsistent
[ 0.387551] DMAR: IOMMU feature prs inconsistent
[ 0.387553] DMAR: IOMMU feature nest inconsistent
[ 0.387554] DMAR: IOMMU feature mts inconsistent
[ 0.387560] DMAR: IOMMU feature sc_support inconsistent
[ 0.387563] DMAR: IOMMU feature dev_iotlb_support inconsistent
[ 0.387568] DMAR: dmar0: Using Queued invalidation
[ 0.387578] DMAR: dmar1: Using Queued invalidation
[ 0.387816] pci 0000:00:02.0: Adding to iommu group 0
[ 0.387864] pci 0000:00:00.0: Adding to iommu group 1
[ 0.387885] pci 0000:00:01.0: Adding to iommu group 2
[ 0.387900] pci 0000:00:04.0: Adding to iommu group 3
[ 0.387930] pci 0000:00:08.0: Adding to iommu group 4
[ 0.387951] pci 0000:00:12.0: Adding to iommu group 5
[ 0.387975] pci 0000:00:14.0: Adding to iommu group 6
[ 0.387989] pci 0000:00:14.2: Adding to iommu group 6
[ 0.388003] pci 0000:00:14.3: Adding to iommu group 7
[ 0.388028] pci 0000:00:15.0: Adding to iommu group 8
[ 0.388042] pci 0000:00:15.1: Adding to iommu group 8
[ 0.388061] pci 0000:00:16.0: Adding to iommu group 9
[ 0.388075] pci 0000:00:17.0: Adding to iommu group 10
[ 0.388100] pci 0000:00:1d.0: Adding to iommu group 11
[ 0.388126] pci 0000:00:1d.6: Adding to iommu group 12
[ 0.388162] pci 0000:00:1f.0: Adding to iommu group 13
[ 0.388177] pci 0000:00:1f.3: Adding to iommu group 13
[ 0.388193] pci 0000:00:1f.4: Adding to iommu group 13
[ 0.388207] pci 0000:00:1f.5: Adding to iommu group 13
[ 0.388216] pci 0000:01:00.0: Adding to iommu group 2
[ 0.388223] pci 0000:01:00.1: Adding to iommu group 2
[ 0.388246] pci 0000:06:00.0: Adding to iommu group 14
[ 0.388269] pci 0000:07:00.0: Adding to iommu group 15
[ 0.388383] DMAR: Intel(R) Virtualization Technology for Directed I/O
IOMMU group
IOMMU Group 2 00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 02)
IOMMU Group 2 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU117M [GeForce GTX 1650 Mobile / Max-Q] [10de:1f99] (rev a1)
IOMMU Group 2 01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10fa] (rev a1)
libvirt hooks tree:
./
├── kvm.conf
├── qemu
└── qemu.d
└── win10
├── prepare
│ └── begin
│ └── bind_vfio.sh
└── release
└── end
└── unbind_vfio.sh
kvm.conf
VIRSH_GPU_VIDEO=pci_0000_01_00_0
VIRSH_GPU_AUDIO=pci_0000_01_00_1
bind_vfio.sh
#!/bin/bash
## Load the config file
source "/etc/libvirt/hooks/kvm.conf"
## Load vfio
modprobe vfio
modprobe vfio_iommu_type1
modprobe vfio_pci
## Unbind gpu from nvidia and bind to vfio
virsh nodedev-detach $VIRSH_GPU_VIDEO
virsh nodedev-detach $VIRSH_GPU_AUDIO
unbinf_vfio.sh
#!/bin/bash
## Load the config file
source "/etc/libvirt/hooks/kvm.conf"
## Unbind gpu from vfio and bind to nvidia
virsh nodedev-reattach $VIRSH_GPU_VIDEO
virsh nodedev-reattach $VIRSH_GPU_AUDIO
## Unload vfio
modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio
qemu
script to add per guest qemu hooks
#!/usr/bin/env bash
#
# Author: SharkWipf
#
# Copy this file to /etc/libvirt/hooks, make sure it's called "qemu".
# After this file is installed, restart libvirt.
# From now on, you can easily add per-guest qemu hooks.
# Add your hooks in /etc/libvirt/hooks/qemu.d/vm_name/hook_name/state_name.
# For a list of available hooks, please refer to https://www.libvirt.org/hooks.html
#
GUEST_NAME="$1"
HOOK_NAME="$2"
STATE_NAME="$3"
MISC="${@:4}"
BASEDIR="$(dirname $0)"
HOOKPATH="$BASEDIR/qemu.d/$GUEST_NAME/$HOOK_NAME/$STATE_NAME"
set -e # If a script exits with an error, we should as well.
# check if it's a non-empty executable file
if [ -f "$HOOKPATH" ] && [ -s "$HOOKPATH" ] && [ -x "$HOOKPATH" ]; then
eval \"$HOOKPATH\" "$@"
elif [ -d "$HOOKPATH" ]; then
while read file; do
# check for null string
if [ ! -z "$file" ]; then
eval \"$file\" "$@"
fi
done <<< "$(find -L "$HOOKPATH" -maxdepth 1 -type f -executable -print;)"
fi
QEMU xml config is similar to one followed in the guide except I am only doing pci passthrough for GPU and audio device as mentioned above and I am not using nvme passthrough as shown in the guide.
Also, the host machine is a laptop. The guide mentioned having multiple monitors. Could this be the main issue why the host is freezing? The kernel logs indicate waiting
for vfio-pci on PCI address corresponding to NVIDIA GPU.