0

Since yesterday evening I've had the following issue with my (home-brewn) Linux (OpenSuSE Leap 15.0) home server: One of its two on-board ethernet links keeps going down, then up, then down again,… quite often.

Normally, these outages last for a couple of seconds and get unnoticed, but I've also had downtimes of several minutes. One of them even causes the RAID on my server lose its consistency.

Here is the output of dmesg (messages of the other network interface, which is not affected, have been deleted):

[  815.518384] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[  835.069925] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[  838.322327] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[  897.277739] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[  900.218140] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[  974.621515] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[  977.501918] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 1069.501256] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[ 1073.073644] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 1126.653085] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[ 1129.465504] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 1440.928076] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[ 1444.176477] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 1447.680055] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[ 1450.888453] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 1532.575777] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[ 1535.812158] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 1554.875708] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[ 1557.728109] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 1688.103281] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[ 1691.411675] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 1753.243072] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[ 1756.119469] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 1781.274983] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[ 1814.927299] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 1839.818791] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[ 2108.682340] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 2119.769891] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[ 2128.118276] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 2172.877720] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[ 2203.106035] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 2207.337608] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[ 2465.245184] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 4206.183121] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Down
[ 4209.203544] igb 0000:01:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX

The network interface is an I210 Gigabit Network Connection (eth2) on an Asus board. And here is the OS version:

╭─root@valen ~  
╰─➤  uname -a
Linux valen 4.12.14-lp150.12.64-default #1 SMP Mon Jun 17 16:53:50 UTC 2019 (3edfd41) x86_64 x86_64 x86_64 GNU/Linux

The last system update was last Tuesday, but the problems started Friday evening, so I don't think this is a driver issue.

What can be the cause? Thank you.

UPDATE: Upon request, here is the ethtool output:

╭─root@valen ~  
╰─➤  ethtool -S eth2
NIC statistics:
     rx_packets: 257315
     tx_packets: 603706
     rx_bytes: 85237411
     tx_bytes: 775258267
     rx_broadcast: 1023
     tx_broadcast: 419
     rx_multicast: 75
     tx_multicast: 104
     multicast: 75
     collisions: 0
     rx_crc_errors: 0
     rx_no_buffer_count: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 82646
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 21
     rx_flow_control_xoff: 21
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_long_byte_count: 85237411
     tx_dma_out_of_sync: 0
     tx_smbus: 0
     rx_smbus: 0
     dropped_smbus: 0
     os2bmc_rx_by_bmc: 0
     os2bmc_tx_by_bmc: 0
     os2bmc_tx_by_host: 0
     os2bmc_rx_by_host: 0
     tx_hwtstamp_timeouts: 0
     tx_hwtstamp_skipped: 0
     rx_hwtstamp_cleared: 0
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_frame_errors: 0
     rx_fifo_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_queue_0_packets: 103348
     tx_queue_0_bytes: 114658835
     tx_queue_0_restart: 0
     tx_queue_1_packets: 61601
     tx_queue_1_bytes: 77258944
     tx_queue_1_restart: 0
     tx_queue_2_packets: 94305
     tx_queue_2_bytes: 97486823
     tx_queue_2_restart: 0
     tx_queue_3_packets: 344452
     tx_queue_3_bytes: 483419252
     tx_queue_3_restart: 0
     rx_queue_0_packets: 30966
     rx_queue_0_bytes: 4981061
     rx_queue_0_drops: 0
     rx_queue_0_csum_err: 0
     rx_queue_0_alloc_failed: 0
     rx_queue_1_packets: 137558
     rx_queue_1_bytes: 65059503
     rx_queue_1_drops: 0
     rx_queue_1_csum_err: 0
     rx_queue_1_alloc_failed: 0
     rx_queue_2_packets: 42880
     rx_queue_2_bytes: 7877235
     rx_queue_2_drops: 0
     rx_queue_2_csum_err: 0
     rx_queue_2_alloc_failed: 0
     rx_queue_3_packets: 45911
     rx_queue_3_bytes: 6290352
     rx_queue_3_drops: 0
     rx_queue_3_csum_err: 0
     rx_queue_3_alloc_failed: 0
Giacomo1968
  • 53,069
  • 19
  • 162
  • 212
Neppomuk
  • 219
  • 2
  • 12
  • 2
    According to the `dmesg`, the link is down for about 19% of the time (2757.476211 seconds up of 3393.685160 seconds measured). Perhaps a defective cable or one of the two port ends is bad? I had a flaky link before on a home computer and came up with [these troubleshooting tips](https://superuser.com/a/954613/83694). – Deltik Jul 13 '19 at 16:25
  • Both the cables, and the port ends haven't been changed for 3 years. Is it possible that the cable or even one of the ports have been worn out? – Neppomuk Jul 13 '19 at 16:44
  • @Neppomuk Can you check the cable and maybe even replace it to test? The reality is this all sounds like a hardware problem equating to a bad cable. “Both the cables, and the port ends haven't been changed for 3 years.” That doesn’t mean that something happened that would have affected the connection. – Giacomo1968 Jul 13 '19 at 17:04
  • Yes, sure I could do that — as soon as I've got a new cable, of course. You mean that my previous cable could have worn out in the meantime? – Neppomuk Jul 13 '19 at 17:19
  • 1
    I’ve often seen this behaviour when a switch port (or switch) goes bad. Try another port, ad if possible swap the switch. Also, if you have made any recent changes to the network, make sure you have not introduced any loops - the switch may be blocking if a loop is detected. – Jens Ehrich Jul 13 '19 at 18:16
  • @JensEhrich: Nay, I haven't changed absolutely nothing to my network. And yes, I've once had an issue with a broken switch, buth then nothing worked at all. – Neppomuk Jul 13 '19 at 21:17

1 Answers1

1

OK, I think I've got the answer: The RJ45 socket on the switch the cable running from my server's eth2 used to be attached to turned out to be faulty! I attached the cable to a previously unused connector, and since then, everything has been working fine till now.

Nevertheless, thank you all for your hints!

Neppomuk
  • 219
  • 2
  • 12