Strange temporary network outage in Linux
I'm facing a very annoying problem that I noticed a week from now and for
which I can't find an answer: my network suddenly stops responding,
usually coming back exactly 25 seconds later. I was using kernel 3.10.4
and now migrated to 3.11-rc4 to see if something changed, but no, the
behavior is the same. And since it is a hard to spot problem due to the
fact usual web surfing is in "bursts" and the outage is completely random,
I can't really tell this problem was present in a previous kernel as well
(I always use custom but unpatched kernels from kernel.org, all compiled
by myself)
I can't tell the kernel is the culprit either, but I can say there are no
clues on the system logs (I checked both /var/log/syslog and
/var/log/messages and there is nothing unusual there) and that hardware
doesn't seem at fault, for the problem shows up using either one of my
network cards:
lspci output:
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5751
Gigabit Ethernet PCI Express (rev 01)
04:00.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 30)
and I already tried to exchange the ethernet switch ports and still no one
else where I work has a problem except me (although we use similar
machines, I'm the only one using Linux, so I had to take some infamous
jokes about it as well... hehe).
I ran up wireshark on my machine and left it continuously pinging our
gateway and another machine on the same network segment. Then, at the
first sign of network malfunction I would check it and verify the gateway
stopped responding pings, but the other machine was still there responding
normally. Some other times is the other machine which stops responding and
the gateway is fine, and some other times both stop responding. I don't
know what else to do, so I'd like some help or tips on how to further
debug this, since the system logs are completely normal.
I have my kernel config file and a capture file from wireshark showing the
situation. I can post here or at some pastebin site in case anyone finds
it useful to understand the case, just please let me know the detail level
I should use (I guess the packet level without the raw data would be
enough). Thanks in advance for any possible help!
No comments:
Post a Comment