5

Every time I suspend or resume my laptop (Dell Latitude E6520, bought this year), I get 2 messages of the form displayed on the console just before shutting down/starting up:

[  407.107610] ehci_hcd 0000:00:1d.0: dma_pool_free buffer-128, f6f18000/36f18000 (bad dma)

On occasion, I get a message of the form:

[ 3753.979066] do_IRQ: 0.177 No irq handler for vector (irq -1)

On occasion, my machine freezes with a flashing Caps Lock button when suspending, after which I need to do a hard shutdown. This never happened before the messages started appearing (a while back), and I think it never happens without a do_IRQ message appearing (although I'm not sure about that). [There's nothing in the owner's manual on a flashing Caps Lock button; apparently it may be a kernel panic if the scroll lock also flashes, but the laptop doesn't have a scroll lock light, and there's no message on the console saying kernel panic.]

Are these bad DMA/do IRQ messages serious, and what can I do to investigate/troubleshoot them and the freezing?

Edit: I've also now received the following error messages a few times:

[246943.023908] JBD: I/O error detected when updating journal superblock for sdb1.
[246943.023958] Buffer I/O error on device sdb1, logical block 0
[246943.023996] EXT3-fs (sdb1): I/O error while writing superblock

Edit: Output of dmesg at http://pastebin.com/ra7MTQEj ; contents of /var/log/kern.log at http://pastebin.com/i6jf0Md9

Edit: the output of some smartctl (-a, -x, --log=error, --log=xerror) instructions is available at http://paste.ubuntu.com/1088488/ .

Edit (31/8/2012): Output of dmesg|grep -i ehci available at http://paste.ubuntu.com/1177246/ .

Edit: (3/9/2012):Output of lshw is at http://paste.ubuntu.com/1183032

Steve Kroon
  • 1,024
  • 2
  • 10
  • 26
  • 1
    **I really don't know**, but I will suggest that you paste the output of the following command: **dmesg | grep -i ehci** to see if there is a hard disk or a usb drive involved. Otherwise **sdb1** has nothing to do with **sda1**. If you check (**fsck**) your root partition (maybe **sda1**) withing you system it seems that you can lost everything. If it is an error of any usb drive maybe the error could be at your **external mouse** or **external keyboard**. – Salvador Aug 30 '12 at 17:42
  • I am now wondering if the **kern.log** is correctly pasted. Before you paste this file you should reboot your laptop. – Salvador Aug 31 '12 at 17:51
  • The issues I have are at suspend and resume. I pasted the `kern.log` after resuming. Do you want to me reboot and post that `kern.log` as well? – Steve Kroon Sep 02 '12 at 13:08

3 Answers3

4

1. "Bad DMA"

Let's deal with the "bad dma" errors first, since they're the only consistent ones which are reflected in your logs.

  • These, as well as any problems suspending/resuming, are caused by your internal USB 3G modem, which from the MAC address is an Ericsson F3507g.
    • Yes, you read that right. Not every USB device has to be external or plugged into one of the visible USB ports. Modern laptops will run a whole bunch of internal peripherals such as Wireless/3G cards, bluetooth, webcams, etc. from an internal USB "hub".

Notice this tell-tale sequence, which repeats every time the "bad dma" errors occur:

[171783.085166] usb 2-1.6: USB disconnect, device number 10
[171783.086623] ehci_hcd 0000:00:1d.0: dma_pool_free buffer-128, eafaa000/2afaa000 (bad dma)
[171783.087046] cdc_ncm 2-1.6:1.6: usb0: unregister 'cdc_ncm' usb-0000:00:1d.0-1.6, CDC NCM
[171783.092382] done.
[171783.129959] ehci_hcd 0000:00:1d.0: dma_pool_free buffer-128, eb1aa000/2b1aa000 (bad dma)
  • The cdc_ncm module is implicated; this is a low-level USB interface to high-speed cellular modems
  • This bug indicates that the F3507g WWAN cards have had similar problems with Ubuntu/Linux before, and a kernel update fixed it.
    • The error should only affect suspend/resume/freezing, and should NOT affect normal operation of the 3G card.
    • But I'd recommend you try one of the mainline kernels (or the Quantal 3.5 kernel), to see if it makes any difference.
    • The other extreme alternative, of course, is to either disable your 3G card in the BIOS, or if you actively use it, consider replacing it with another brand/model.

2. "do_IRQ" and "sdb1"

It's harder to debug these isolated warnings without context (which can be the key, as shown above). So we'll just have to guesstimate until you can provide a kern.log containing one or both of these errors.

  • "do_IRQ" seems to stem most often from PCI-Express bus issues, including graphics cards, and VIA chipsets are often implicated.
    • This message can otherwise be safely ignored.
  • Given that your SMART logs look OK, the "sdb1" errors probably come from even more USB communication issues with the external drive.

    • If you find more USB errors around these, I'd chalk it down to an occasional USB incompatibility and not worry; but if they occur only by themselves, it may indicate a problem with the drive. A more complete log would help :)
  • Again, I'd recommend trying one of the Quantal 3.5 kernels and seeings if things change, especially for the "do_IRQ".

3. Trying the 3.5-series Quantal Kernel (or a mainline build)

  • Once Ubuntu 12.10 is released, its kernel will be made available for 12.04 as a "backport" (the same goes for 13.04 and 13.10).
  • Right now, you can get the "beta" kernels from the Ubuntu-X team PPA
  • BUT this PPA also contains a number of extra packages which you have no need to upgrade.
  • So I've made just the backported kernel available in another PPA
  • To install:

    sudo apt-add-repository ppa:auanswers/lts-backported-kernels-prerelease
    sudo apt-get update
    sudo apt-get inst all linux-generic-lts-quantal
    
  • Reboot, and you should boot into the new kernel (check with uname -a). Nvidia/AMD graphics and Broadcom wireless cards may be problematic. You can always select your old 3.2-series kernel by keeping Shift pressed at boot until the Grub menu shows, and then going into "Previous Linux Versions"

  • For even more bleeding-edge kernels, you can try one of the mainline builds. Please see this question and answer for more information:

Should I upgrade to the "mainline" kernels?

ish
  • 138,666
  • 36
  • 303
  • 312
  • FYI: I upgraded from Oneiric to precise to see if that would help as a first step, but no luck. In fact, I now get other errors from `watchdog` and `mei` about devices registering a watchdog... So, I'll try the Quantal kernel next. – Steve Kroon Sep 03 '12 at 10:05
  • Wait, you're on Oneiric !? – ish Sep 03 '12 at 10:12
  • I was, but not anymore. – Steve Kroon Sep 03 '12 at 17:17
  • The Quantal kernel got rid of the bad dma errors, thanks! How should I proceed in the future? (When/how should I remove the PPA and add the backports?) – Steve Kroon Sep 03 '12 at 18:33
2

The errors you added on the Edit seem to refer to a broken disk sector.

Have you tried running fsck or badblocks?

I suggest you to perform everything from a Live CD environment as follows

  1. Boot the live Ubuntu CD (or any other distro)
  2. Scan for disks and partitions with fdisk

    sudo fdisk -l
    
  3. Once you identified the correct disk label (For example /dev/sda1) try running these two commands. The -c parameter to the fsck command tries to identify and isolate bad blocks

    sudo e2fsck -cv /dev/sda1
    sudo badblocks -sv /dev/sda
    
jokerdino
  • 41,000
  • 24
  • 132
  • 201
Andrea Olivato
  • 1,303
  • 13
  • 16
  • Thanks - will follow up on this when I can get a Live CD - hopefully tomorrow. Meanwhile, it's disappointing to note that neither the `-c` nor the `-v` option of `fsck` appear in its man page. (What does the `-v` do?) – Steve Kroon Aug 30 '12 at 06:45
  • 1
    sorry the program to use was e2fsck not fsck. with e2fsck you can see from the manual that -c is to isolate bad block and -v to be verbose. – Andrea Olivato Aug 30 '12 at 09:51
  • Well, this is all academic, since there is only one hard drive in my machine, and it's /dev/sda, not /dev/sdb. (In fact, there doesn't seem to be a /dev/sdb on my machine at all.) However, perhaps /dev/sdb is my external drive that I have plugged in on occasion. I will try running these tests on that drive. – Steve Kroon Aug 31 '12 at 10:02
  • I ran e2fsck on /dev/sdb1, output at `http://paste.ubuntu.com/1181414/` - no bad blocks found. – Steve Kroon Sep 02 '12 at 13:06
  • 1 large file 1 directory ?? This means you have just one file, are you sure you ran the command on the right volume? can you post the output of `sudo fdisk -l` ? – Andrea Olivato Sep 03 '12 at 14:51
  • I deleted that part of my comment - sdb1 was an empty partition I had previously used as a bootable Ubuntu. So the partition was empty before I ran e2fsck, which only added the lost+found directory. – Steve Kroon Sep 03 '12 at 17:19
1

For the "no irq for vector" issue, try adding "pci=nomsi" to the kernel boot options.

Colin Ian King
  • 18,370
  • 3
  • 59
  • 70