Ctrl K

Linux System Maintenance and Error Review

Review Linux system freezes and runtime failures, inspect previous boot logs, check driver and package state, and safely update an Arch Linux workstation.

This workflow is a reusable maintenance reference for an Arch Linux workstation. It focuses on runtime freezes, previous boot log review, kernel and driver diagnostics, package checks, and safe full-system updates.

Overview

  • Use previous boot logs after a hard power-off, freeze, crash, or unexpected shutdown.
  • Start with journalctl before changing packages, services, drivers, or kernel options.
  • Separate kernel, driver, firmware, memory, power, service, and package evidence.
  • Use targeted grep patterns to find lockups, hardware errors, GPU errors, memory pressure, and shutdown events.
  • Check installed kernel and driver package versions before assuming a configuration problem.
  • Use pacman -Syu for Arch kernel, firmware, and driver updates instead of updating one package by hand.
  • After reboot, verify the new kernel, relevant packages, current boot logs, and affected subsystem state.

Check the previous boot after restart

After a freeze and reboot, use -b -1 to inspect the previous boot. This is usually more useful than checking the current boot.

journalctl -b -1 -p err

journalctl -b -1 -e

journalctl -b -1 -n 100

If the journal ends suddenly near the freeze time and has no clean shutdown sequence, the system likely locked before it could write more logs.

Inspect the event time window

Use a narrow time window around the event. This keeps the output readable and makes the cause easier to separate from old warnings.

journalctl -b -1 --since "YYYY-MM-DD HH:MM" --until "YYYY-MM-DD HH:MM"

journalctl -k -b -1 --since "YYYY-MM-DD HH:MM" --until "YYYY-MM-DD HH:MM"

Use the kernel-only command when the issue looks like a GPU, driver, firmware, suspend, power, or hardware problem.

Search for hard-freeze indicators

Search the previous boot for common GPU, memory, and kernel lockup terms. This is the fastest way to move from a general freeze report to a specific subsystem.

journalctl -b -1 | grep -iE "nvidia|amdgpu|i915|nouveau|NVRM|Xid|GSP|gpu|drm|xorg|oom|out of memory|killed process|watchdog|soft lockup|hard lockup|hung task|blocked for more than|kernel panic"

journalctl -k -b -1 | grep -iE "nvidia|amdgpu|i915|nouveau|NVRM|Xid|GSP|gpu|drm|watchdog|lockup|hung|panic|blocked"

GPU heartbeat, Xid, DRM, or firmware messages usually point below the user session. OOM or killed process messages point to memory pressure instead.

Interpret common log patterns

Log patternMeaningFirst action
GPU heartbeat timed outThe GPU driver or firmware path stopped responding.Update kernel and GPU packages, then monitor.
NVRM or XidThe Nvidia driver reported a GPU driver, firmware, or hardware event.Inspect the exact message and compare with recent driver changes.
amdgpu or i915 resetThe AMD or Intel graphics driver attempted recovery or reset.Inspect kernel logs and recent kernel or firmware updates.
Out of memory or Killed processThe kernel terminated a process due to memory pressure.Check RAM, swap, browser tabs, containers, and long-running processes.
watchdog, hard lockup, soft lockupThe kernel detected a stuck CPU or blocked kernel path.Inspect kernel logs and recent driver or kernel updates.
User-session or service warningsOften unrelated application, desktop, or user-service messages.Do not treat as the cause unless they match the failure time and repeat with the failure.
Service messages after power keyA service reacted to shutdown or power-off handling.Treat as shutdown handling unless it appears before the failure.

Check current GPU state

After reboot, confirm that the GPU driver is loaded and the GPU is visible. Use the tool that matches the installed graphics stack.

nvidia-smi
nvidia-smi -q | grep -i "GSP"

lspci -nn | grep -Ei "vga|3d|display|amd|intel|nvidia"

For Nvidia systems, nvidia-smi should show the driver version, GPU name, display state, memory usage, and active graphics processes such as Xorg, Wayland compositor, browser, or desktop shell.

Check installed kernel and graphics packages

On Arch, the kernel and graphics driver packages should move together. Always check both sides when debugging a graphics freeze.

uname -r

pacman -Qs '^linux$'
pacman -Qs '^linux-lts$'

pacman -Qs '^nvidia-open$'
pacman -Qs '^nvidia-utils$'
pacman -Qs '^opencl-nvidia$'
pacman -Qs '^cuda$'

pacman -Qs '^mesa$'
pacman -Qs '^vulkan-radeon$'
pacman -Qs '^vulkan-intel$'
pacman -Qs '^linux-firmware$'

The exact package names depend on the GPU vendor and Arch packaging state. Do not assume an old package name is available before checking pacman.

Check available driver packages

If a package name is not found, verify enabled repositories and package databases before removing anything. Do not remove the active graphics driver until the replacement package is confirmed.

grep -nE "^\[|^Include" /etc/pacman.conf

sudo pacman -Syy

pacman -Ss '^nvidia$'
pacman -Si nvidia
pacman -Si nvidia-open

pacman -Ss nvidia | grep -E '^extra/nvidia|^extra/nvidia-open|^extra/nvidia-dkms|^extra/nvidia-lts'

pacman -Ss mesa
pacman -Ss vulkan

If pacman cannot find a package, stop and inspect the repository state. Do not proceed with partial package changes.

Check GPU driver parameters

Use driver parameter files to inspect loaded options. The exact paths and values depend on the driver.

cat /proc/driver/nvidia/params | grep -Ei "EnableGpuFirmware|EnableGpuFirmwareLogs|Preserve|DynamicPowerManagement"

modinfo nvidia | grep -iE "firmware|power|modeset"

lsmod | grep -Ei "nvidia|amdgpu|i915|drm"

Driver parameters are context, not a fix by themselves. Change them only after the logs point to a specific driver path.

Check memory and swap

Check memory and swap after freezes, browser-heavy sessions, container workloads, local builds, virtual machines, or long uptime. Memory pressure can look like a desktop or application freeze.

free -h

swapon --show

journalctl -b -1 | grep -iE "oom|out of memory|killed process|memory allocation failure" | tail -120

If the previous boot shows OOM or killed process messages, focus on RAM, swap, containers, browsers, and long-running processes before changing graphics drivers.

Check failed services

Failed services can explain missing hardware controls, network failures, audio failures, mount issues, or background maintenance failures. Check them separately from kernel logs.

systemctl --failed

systemctl status service-name --no-pager -l

journalctl -u service-name -b --no-pager -n 100

Replace service-name with the service shown by systemctl --failed. If a service failed after the power key or during shutdown, treat it as shutdown context unless it appeared before the failure.

Review current boot for fresh errors

After reboot, check whether the current boot is clean or already showing the same failure pattern.

journalctl -k -b | grep -iE "NVRM|Xid|GSP|nvidia|amdgpu|i915|gpu|drm|watchdog|lockup|hung|panic|blocked|oom|out of memory" | tail -120

Normal driver load messages are expected. Repeating heartbeat timeout, Xid, reset loops, OOM events, watchdog events, or hard lockup messages are not normal.

Run a safe Arch system update

On Arch, update the full system instead of updating only the kernel or only one driver. This keeps the kernel, modules, firmware, and user-space packages aligned.

sudo pacman -Syu

sudo reboot

After reboot, verify the new versions and confirm the affected subsystem still works.

uname -r

pacman -Qs '^linux$'
pacman -Qs '^linux-lts$'

pacman -Qs '^nvidia-open$'
pacman -Qs '^nvidia-utils$'

nvidia-smi 2>/dev/null || true

systemctl --failed

Post-update validation

After an update, compare the current boot logs with the failed previous boot. The goal is not to remove every warning. The goal is to confirm the fatal failure pattern is gone.

journalctl -k -b | grep -iE "NVRM|Xid|GSP|nvidia|amdgpu|i915|gpu|drm|watchdog|lockup|hung|panic|blocked|oom|out of memory" | tail -120

nvidia-smi 2>/dev/null || true

systemctl --failed
  • Good sign: the affected driver or service loads without repeating the previous failure pattern.
  • Good sign: the current boot has no repeated lockup, OOM, reset, or watchdog messages.
  • Good sign: systemctl --failed does not show a service related to the failure.
  • Watch item: isolated firmware or platform warnings can remain without causing a failure.
  • Bad sign: repeated heartbeat, reset, Xid, OOM, watchdog, or hard lockup messages return.

Live monitoring during test runtime

After a driver, kernel, firmware, or service change, monitor for the same failure pattern during normal use. This is useful during the first 24 to 48 hours after a change.

watch -n 60 'journalctl -k -b | grep -iE "NVRM|Xid|GSP|nvidia|amdgpu|i915|gpu|drm|watchdog|lockup|hung|panic|blocked|oom|out of memory" | tail -40'

If repeated subsystem or kernel errors appear while the system is still usable, save work and reboot before the session fully locks.

Maintenance decision table

SituationAction
One freeze after long uptime and an old driver or kernel versionUpdate the full system, reboot, and monitor.
Same subsystem error returns after updateSave logs from the failed boot and evaluate driver, kernel, firmware, power profile, service, or package alternatives.
Current boot has only normal load messagesKeep the setup and monitor during normal work.
Current boot has platform firmware warnings onlyTrack them, but do not treat them as the freeze cause by themselves.
Current boot has OOM or killed process messagesCheck memory, swap, containers, browser load, build jobs, and long-running processes.
A related service is failedInspect systemctl status and journalctl -u for that service before changing driver packages.
TTY or SSH works during freezeInspect logs and restart the affected service or reboot cleanly.
TTY and SSH are unavailableUse controlled reboot if possible, then inspect previous boot logs.

Command summary

journalctl -b -1 -p err
journalctl -b -1 -e
journalctl -k -b -1 | grep -iE "NVRM|Xid|GSP|nvidia|amdgpu|i915|gpu|drm|watchdog|lockup|hung|panic|blocked|oom|out of memory"

uname -r
pacman -Qs '^linux$'
pacman -Qs '^linux-lts$'

lspci -nn | grep -Ei "vga|3d|display|amd|intel|nvidia"

lsmod | grep -Ei "nvidia|amdgpu|i915|drm"

free -h
swapon --show

systemctl --failed

sudo pacman -Syu
sudo reboot

journalctl -k -b | grep -iE "NVRM|Xid|GSP|nvidia|amdgpu|i915|gpu|drm|watchdog|lockup|hung|panic|blocked|oom|out of memory" | tail -120