2
  • Processor: Intel(R) Core(TM) i7-10700KF CPU @ 3.80GHz
  • Memory: 32770MB (4395MB used)
  • Motherboard: PRIME Z490-A
  • Graphics Card: RTX 2060
  • Operating System: Pop!_OS 22.04 LTS

The last two days, when I return to my workstation from the night before (I leave my system powered up) my system appears to have been restarted.

My previously open applications are all closed as if the system was restarted. But I know that it didn’t restart because the logs say so.

DMESG shows some interesting things, but no shutdown or start up messages except for when I manually restart the computer after arriving and finding my applications closed out and applications not functioning or acting erratically.

I’m not sure what this is but it seems like an issue with memory. I did a full hardware test on my system and found no trouble. I was wondering if someone could help me diagnose this. Here is my DMESG log on Pastebin.

Please note when I return to my workstation in the morning, at around 7:00-8:00am, the logs show my manual restart. I have to do this because of the erratic behavior.

Here are the last dozen or so lines of the DMESG log:

Mar 03 00:00:01.012586 pop-os kernel: audit: type=1400 audit(1677830401.008:63): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=1365188 comm="cupsd" capability=12  capname="net_admin"
Mar 03 07:57:21.044559 pop-os kernel: rfkill: input handler enabled
Mar 03 07:57:22.280555 pop-os kernel: hub 1-6:1.0: USB hub found
Mar 03 07:57:22.280701 pop-os kernel: hub 1-6:1.0: 4 ports detected
Mar 03 07:57:23.648558 pop-os kernel: ntfs3: Unknown parameter 'windows_names'
Mar 03 07:57:23.852585 pop-os kernel: rfkill: input handler disabled
Mar 03 08:04:21.444861 pop-os kernel: rfkill: input handler enabled
Mar 03 08:04:24.060560 pop-os kernel: rfkill: input handler disabled
Mar 03 08:04:51.664560 pop-os kernel: rfkill: input handler enabled
Mar 03 08:04:51.692659 pop-os kernel: EXT4-fs (sdc4): unmounting filesystem.
Mar 03 08:04:51.720558 pop-os kernel: EXT4-fs (sde1): unmounting filesystem.
Mar 03 08:04:53.208616 pop-os systemd-shutdown[1]: Syncing filesystems and block devices.
Mar 03 08:04:53.228718 pop-os systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Giacomo1968
  • 53,069
  • 19
  • 162
  • 212
trinsic
  • 150
  • 9
  • 1
    Advice on finding where a restart is done (or if there is any software causing restarts via the `reboot` command): - Find last boot time with: `who -b`, or `uptime` (or `tuptime -t`) to see how long it has been up - Check `/var/log/messages` around this time - Search for restart in cron scripts: `grep -re "sudo reboot" /etc/cron.d/` - If desperate, search everywhere : `grep -re "sudo reboot" /etc/ /var/` If you can't find any software that could be the cause, you should perhaps look for hardware causes (perhaps power). – harrymc Mar 04 '23 at 18:06
  • 1
    My answer was deleted and replaced by the above confusing comment. Try to understand it in its degraded state. In addition: I would suggest running as first hardware test [MemTest86](https://www.memtest86.com/) at least over-night. It will test the RAM, but also general system functions. (Add to your comment `@harrymc` for me to be notified.) – harrymc Mar 04 '23 at 21:09
  • 3
    Your kernel log does not contain a reboot. The system was continuously running. Unfortunately, you do not describe in detail what led you to believe your system was restarted. – Daniel B Mar 04 '23 at 21:34
  • I did describe in detail. My applications were all closed, AS IF, it restarted. But I know that it didn't restart because the logs say so. Also I pasted my entire DMESG log and posted a link to it via pastebin for the last 4 restarts. Please read my entire message. Thank you. – trinsic Mar 04 '23 at 23:47
  • Please read the comment in which I ask for the rest of the log, **besides dmesg.** You will not find anything useful in dmesg for this situation – it contains none of the userspace messages where the shutdown restart would be initiated; the only thing it will have is the final steps where the kernel is finally being asked to power off. (Please also check your pastebin again and note that it only contains one boot, not four.) – u1686_grawity Mar 05 '23 at 00:02
  • user16 this is the command I used to gather the logging info: sudo journalctl -o short-precise -k -b -2. Please specify what I need to do to get you what you need. – trinsic Mar 05 '23 at 00:51
  • @harrymc - I will do a memtest86 tonight before I leave. Its 32 gigs of ram so its going to take awhile to do to full test. I'll report back. – trinsic Mar 05 '23 at 00:57
  • @trinsic: Check also the SMART attributes of the disk. If this and MemTest86 come out as clean, then this is likely not a hardware error on your computer (but the possibility of it coming from your environment still stays). – harrymc Mar 05 '23 at 08:55
  • 1
    Drop the `-k` in your `journalctl` invocation. Of course, the resulting log will be much too long, so you’ll have to look through it yourself first, then provide spots of (potential) interest. I also recommend narrowing down the log using time filters instead of the “boot ID” filter. You know the night it happened, you should make use of that information. – Daniel B Mar 05 '23 at 09:04
  • @trinsic: Regarding the message in your log for `ntfs3: Unknown parameter 'windows_names'`, see [this answer](https://askubuntu.com/a/1424067/963426). This may mean just an NTFS mount problem after the reboot, but might also relate to the basic problem (we don't know at this time). – harrymc Mar 05 '23 at 09:10

2 Answers2

2

So @harrymc’s suggestion to test the RAM using MemTest86 was the answer. One of the memory sticks was bad and that is likely the cause of my problems. The logs around the evening suggested by @daniel-b were inconclusive as there are too many log entries to sift through.

But since this is likely a RAM issue, the logs probably won't help me confirm the problem of my applications crashing. I am going to see how things go with the bad RAM stick removed. Thanks for all of your help!

Giacomo1968
  • 53,069
  • 19
  • 162
  • 212
trinsic
  • 150
  • 9
  • 1
    While it}s great that you solved this issue based on advice in the comments, you should really have followed up in the comments by telling @harrymc to post the MemTes86 comment as a full answer. While they previously posted comments as an “[answer](https://superuser.com/a/1771909/167207)” that was then deleted and converted to comments, you should have at least given harrymc the chance to post a new answer that states assuredly, “You should use MemTest86 to test your system’s RAM since this sounds like a RAM issue to me.” – Giacomo1968 Mar 05 '23 at 23:23
  • Ok will do that next time. Unless I can do something else to make that happen now. – trinsic Mar 06 '23 at 02:26
  • 1
    Well, @harrymc could just post a new answer now and then it is your turn to upvote and accept that answer. – Giacomo1968 Mar 06 '23 at 12:42
1

Advice on finding where a restart is done (or if there is any software causing restarts via the reboot command):

  • Find last boot time with: who -b, or uptime (or tuptime -t) to see how long it has been up
  • Check /var/log/messages around this time
  • Search for restart in cron scripts: grep -re "sudo reboot" /etc/cron.d/
  • If desperate, search everywhere : grep -re "sudo reboot" /etc/ /var/

If you can't find any software that could be the cause, you should perhaps look for hardware causes (perhaps power).

I would suggest running as first hardware test MemTest86 at least over-night. It will test the RAM, but also general system functions. Check also the SMART attributes of the disk. If this and MemTest86 come out as clean, then this is likely not a hardware error on your computer (but the possibility of it coming from your environment still stays).

harrymc
  • 455,459
  • 31
  • 526
  • 924
  • 1
    So I have been running without trouble so I think I have confirmed that the bad ram stick was causing all my problems. I originally posted this question to get help on an issue with my browsing apps crashing and not responding. But I also had problems with my file system becoming corrupted from time to time where my operating system would no longer respond. I should have tested my ram fully along time ago. Save yourself to the trouble, I spent countless hours tracking down issues I thought were related to my OS or apps. Thank you for the heads up. – trinsic Mar 17 '23 at 20:23