2

I upgraded my NVMe SSD from a PC300 SK Hynix 512G to a Kingston SA2000 1T in my Dell XPS 15. I used Clonezilla to clone the disk and then GParted to resize/move partitions. Everything went smoothly without error.

But now, on Ubuntu 20.04, I experience freezing when doing heavier I/O operations (Android app compilation, writing multiple files via scripts, sometimes opening Chrome,...).

I don't see anything in /var/log/syslog or in /var/log/kern.log corresponding to the freezing occurrences but the system is completely stuck: I cannot ctrl+C in the terminal or start anything. Also, icons disappear progressively from menus (as it tries to load them?).

On Windows 10, I have no issue while gaming, so I presume it's related to Ubuntu but cannot really prove it.

I tried a check disk, it didn't report anything, the only thing reported was "inode extent tree (at level 1) could be shorter" which I corrected. smartctl doesn't show any error:

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-58-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       KINGSTON SA2000M81000G
Serial Number:                      50026B76842D46F8
Firmware Version:                   S5Z42105
PCI Vendor/Subsystem ID:            0x2646
IEEE OUI Identifier:                0x0026b7
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            782,120,886,272 [782 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            0026b7 6842d46f85
Local Time is:                      Tue Dec 29 11:57:29 2020 CET
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     75 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W       -        -    0  0  0  0        0       0
 1 +     4.60W       -        -    1  1  1  1        0       0
 2 +     3.80W       -        -    2  2  2  2        0       0
 3 -   0.0450W       -        -    3  3  3  3     2000    2000
 4 -   0.0040W       -        -    4  4  4  4    15000   15000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        25 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    911,056 [466 GB]
Data Units Written:                 1,455,569 [745 GB]
Host Read Commands:                 13,570,300
Host Write Commands:                12,412,188
Controller Busy Time:               104
Power Cycles:                       35
Power On Hours:                     23
Unsafe Shutdowns:                   16
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Thermal Temp. 1 Transition Count:   41
Thermal Temp. 1 Total Time:         837

Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged

My system is up to date on Linux gp2mv3-laptop 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux.

What can I do to correct this issue? Is there an incompatibility issue with Ubuntu?

Sourav Ghosh
  • 401
  • 4
  • 13
Gp2mv3
  • 1,926
  • 2
  • 12
  • 9
  • It's just a wild guess, but do you have any partition mounted with `sync` option? It makes writing snail-slow. I know, it should still remain stable; nevertheless, can't harm considering... – Levente Dec 29 '20 at 13:23
  • I also recommend using - on 20.04 - `gnome-shell-extension-system-monitor`: it offers you a real-time insight into processor load and I/O wait, and disk read-write operations. When the UI freezes, it conveniently freezes the graphs too, showing the moments leading up to the freeze. It is highly customizable: https://askubuntu.com/questions/1302361/too-many-icons-cannot-see-the-clock/1302389#1302389 Note, that the `disk` graph is not enabled by default, needs enabling in the preferences. – Levente Dec 29 '20 at 13:30

1 Answers1

1

I found the culprit and the solution, after contacting the manufacturer of the SSD.

Culprit

The issue is in the SSD firmware, in the implementation of APST. From what I understood, the SSD gives wrong timing information to the kernel.

APST is a power saving mode that put the SSD into sleep mode and need to know the "wake-up delay" needed by the SSD. The Firmware in the A2000 advertises a faster wake-up delay which blocks the wake-up and the SSD.

To solve this issue, you can either disable APST or override the value advertised by the SSD.

Solution

You have to edit the nvme_core.default_ps_max_latency_us config. Use 0 to disable it, or a sufficiently large value to avoid issues.

Open /etc/default/grub and add nvme_core.default_ps_max_latency_us=500 at the end of the GRUB_CMDLINE_LINUX_DEFAULT variable.

The answer comes from that post: https://askubuntu.com/a/1100886/33386

Gp2mv3
  • 1,926
  • 2
  • 12
  • 9