3

About a year ago, I acquired a threadripper based desktop computer, the hardware details of which are:

Samsung 970 EVO Plus 500GB - Solid state drive
ASRock X399M TAICHI (latest firmware)
Gigabyte Radeon RX 580 GAMING 4GB
AMD Ryzen Threadripper 2950X - Processor
Corsair Vengeance LPX (32GB)

I've been using NixOS on this machine lately and Arch earlier. The current configuration of this system is:

Linux quasar-nixos-tr 5.4.6 #1-NixOS SMP Sat Dec 21 10:05:23 UTC 2019 x86_64 GNU/Linux

However, this system is far from stable. It constantly crashes with a hard lockup with no way to recover except for a hard poweroff. Dropping to a tty also doesn't work and I cannot ssh into this machine either.

I'd really appreciate any hints on where to look to fix this. Issuing a poweroff and reboot to this system also often results in a kernel panic, one of which I managed to capture: https://i.stack.imgur.com/nEVKC.jpg

I've also run a memtest which revealed no issues with my memory. Sorting through logs have also revealed nothing so far.


Things checked so far:

  • rdrand bug: my system is unaffected

I've given up on this. The 2950x is just not viable on Linux. What I did:

  • I RMA'd the processor, and a stable system for a few weeks which went back to recurring crashes again.
  • I've tried changing power settings in the BIOS, to no avail.
  • I finally sold the processor and got myself a 3950X. That meant quite a bit financial hit for me.

I now do have a stable system, and I think my current system is at least as performant as the previous TR system.

Ashesh
  • 133
  • 6
  • Check for the [rdrand bug](https://arstechnica.com/gadgets/2019/10/how-a-months-old-amd-microcode-bug-destroyed-my-weekend/). Update BIOS to the latest version if you're not using it yet. – gronostaj Mar 29 '20 at 10:05
  • You can also check [Random Soft Lockup on new Ryzen build](https://bugzilla.kernel.org/show_bug.cgi?id=196683) , more secifically https://bugzilla.kernel.org/show_bug.cgi?id=196683#c339 and https://bugzilla.kernel.org/show_bug.cgi?id=196683#c474 (Threadripper 2950X) – A.B Mar 29 '20 at 10:47
  • @gronostaj thanks! I'll check that and report back! – Ashesh Mar 29 '20 at 10:52
  • @gronostaj it seems to me though that the microcode must be updated through the BIOS, right? If so, I've updated my BIOS to the latest version, and if I'm suffering from this issue, then by my understanding I'll need to upgrade the microcode? – Ashesh Mar 29 '20 at 11:43
  • @gronostaj so, as advised on that blog post, I did download the rdrand test script and ran it. My system is unaffected with this issue. https://gist.github.com/asheshambasta/df826c1395d2aa035b74130bb7f256d8 – Ashesh Mar 29 '20 at 11:57

0 Answers0