1

Iv'e installed an Ubuntu 18.04.4 server hosting docker containers and every few days the system crashes with the following entries in the system log:

Jun 16 08:52:13 shauls_home_server kernel: watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [containerd:1293]
Jun 16 08:52:41 shauls_home_server kernel: watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [containerd:1293]
Jun 16 08:52:45 shauls_home_server kernel: watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [rtorrent main:4407]
Jun 16 08:52:45 shauls_home_server kernel: watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [runc:11721]
Jun 16 08:53:09 shauls_home_server kernel: watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [containerd:1293]
Jun 16 08:53:13 shauls_home_server kernel: watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [rtorrent main:4407]
Jun 16 08:53:13 shauls_home_server kernel: watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [runc:11721]

I've already tried this solution: adding nouveau.modeset=0 to GRUB's Linux line.

And this: echo 20 > /proc/sys/kernel/watchdog_thresh, As suggested here.

Here is my Journalctl log just before the crash.

Any help on how I can figure out what's causing the issue will be appreciated.

Arnab Nandy
  • 263
  • 5
  • 12
Shaulliv
  • 11
  • 1
  • 4
  • I've updated the bios on Saturday, waiting to see if the issue happens again. – Shaulliv Jun 22 '20 at 11:30
  • The system hanged over the weekend, but there is nothing in the log. I'm planing to wait for at least another week before I do anything. – Shaulliv Jun 28 '20 at 11:38
  • The system still crashes but now without writing anything to the log. I also tried changing the GPU to an AMD card, without success. – Shaulliv Jul 07 '20 at 10:13
  • Setting the threshold to 20 seems like it won't help if the complaints you're getting are for 22 or 23 seconds. FWIW I'm experiencing this problem as well and still seeking resolution. – pattivacek Sep 11 '20 at 05:51
  • The bios update seems to have fixed the stuck CPU problem. – Shaulliv Oct 11 '20 at 19:06
  • ...contd. As far as the crashing in general found out the issue is an AHCI controller unavailable issue. I've since changed to CentOS, so I'm opening a new question Super User. – Shaulliv Oct 11 '20 at 19:12
  • Can you transfer this question there, or post a link to the new question? – pattivacek Oct 12 '20 at 08:28
  • Sure, here's the link:[https://superuser.com/questions/1592526/machine-crashes-with-ahci-controller-unavailable-and-not-using-asmedia-controll](https://superuser.com/questions/1592526/machine-crashes-with-ahci-controller-unavailable-and-not-using-asmedia-controll) – Shaulliv Oct 12 '20 at 22:22
  • 1
    So, I found the problem. It was the ryzen c6 state bug. Since I wasn't able to mitigate the issue (despite the issue having workarounds) I just bought an Intel CPU (and compatible MOBO). The PC now ran for two weeks with no problems. If anyone is interested here's the bug report on the ryzen c6 state bug issue: [https://bugzilla.kernel.org/show_bug.cgi?id=196683](https://bugzilla.kernel.org/show_bug.cgi?id=196683) – Shaulliv Nov 01 '20 at 13:44
  • Well, I have an Intel Xeon so that definitely isn't why I'm hitting this bug. Oh well. – pattivacek Nov 02 '20 at 06:49
  • The core issue with Ryzen is poor quality control (in some cases people reported RMAing the CPU helped - the new one worked). So maybe your Xeon has some manufacturing flaw? – Shaulliv Nov 02 '20 at 13:20

2 Answers2

0

BIOS

Gigabyte Technology Co., Ltd. A320M-H/A320M-H-CF

You have BIOS F40, dated 06/28/2019.

There's a newer BIOS available, F50, dated 11/28/2019, and can be downloaded here.

Note: Verify that I have the correct web page for your model #.

Note: Have good backups before updating the BIOS.

heynnema
  • 68,647
  • 15
  • 124
  • 180
0

I have Xeon CPU and exact same problem that prevents it from booting into Linux Mint 20 or Windows 10. What helped me is going to

bios > Chipset > Advanced Power Management Configuration > CPU C State Control > enabled CPU C3 report and disabled CPU C6 report.

(i have a basic AMI bios, your path may vary)

Flippy
  • 1
  • 1