Ticket #2255 (closed outage: fixed)

Opened 4 years ago

Last modified 4 years ago

Very poor CVS+Koji connectivity

Reported by: monnerat Owned by: mmcgrath
Priority: critical Milestone:
Component: Systems Version:
Severity: High Keywords:
Cc: alexlan Blocked By:
Blocking: Sensitive:

Description

phenomenon

Connection attempts to cvs.fedoraproject.org or koji.fedoraproject.org on July 5th, 2010 PM CEST fail about 20% of times.

I've spend the afternoon trying to execute common/cvs-import.sh script for a package that has ~15 CVS sources and patches, at last succeeding for devel and F-12, but not (yet) for F-13.

The error never occurs at the same place (i.e.: network dependent) and the most common error issued is:

ssh: connect to host cvs.fedoraproject.org port 22: connection timed out
cvs [add aborted]: end-of-file from server (consult above messages if any)

Other connections are satisfactory (I do not think the problem is on my side).

recommendation

See thread at http://lists.fedoraproject.org/pipermail/devel/2010-July/138355.html

Change History

comment:1 Changed 4 years ago by alexlan

  • Priority changed from major to critical
  • Cc alexlan added

I can confirm similar outages with both CVS and koji (both web and uploading), e.g.:

cvs up
ssh: connect to host cvs.fedoraproject.org port 22: Connection timed out
cvs [update aborted]: end of file from server (consult above messages if any)

This happens to almost every other attempted CVS connection. I try once, get the error, try again, it works, then get the error.

Also attempting the use the "make srpm-scratch-build" will stall or timeout trying to upload the SRPM to koji quite frequently.

comment:2 Changed 4 years ago by till

You can try to spot the problem using: sudo mtr cvs.fedoraproject.org or sudo tcptraceroute cvs.fedoraproject.org 22

These commands show no problem from here.

comment:3 Changed 4 years ago by monnerat

# mtr --report -c 100 cvs.fedoraproject.org
HOST: linuxdev.datasphere.ch      Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. firewall.datasphere.ch        0.0%   100    0.1   0.1   0.1   4.5   0.4
  2. fa0-0.rt1.plo1.dfinet.net     0.0%   100    0.8   1.1   0.7   5.4   0.7
  3. gi1-2.rt-b2.cc.dfinet.net     0.0%   100    0.8   3.2   0.7 176.5  17.8
  4. ge-2-4.r00.gnvasw01.ch.bb.gi  0.0%   100    1.0  28.5   0.9 191.4  50.4
  5. xe-3-1.r01.gnvasw01.ch.bb.gi  0.0%   100    1.6  24.4   1.0 258.8  51.8
  6. ge-8-4.r00.frnkge03.de.bb.gi  0.0%   100   11.8  30.6  11.5 237.4  45.7
  7. xe-0.globalcrossing.frnkge03  0.0%   100   14.0  21.9  11.5 188.3  36.3
  8. internap-ken-schmid-phx.ge-3  0.0%   100  168.4 171.2 168.4 341.8  18.3
  9. border1.po1-bbnet1.phx004.pn  0.0%   100  169.4 177.8 169.0 338.6  28.8
 10. redhat-2.border1.phx004.pnap  0.0%   100  169.7 173.6 169.6 285.3  18.3
 11. ???                          100.0   100    0.0   0.0   0.0   0.0   0.0
 12. ???                          100.0   100    0.0   0.0   0.0   0.0   0.0
 13. ???                          100.0   100    0.0   0.0   0.0   0.0   0.0
 14. ???                          100.0   100    0.0   0.0   0.0   0.0   0.0
 15. ???                          100.0   100    0.0   0.0   0.0   0.0   0.0
 16. cvs.fedoraproject.org        21.0%   100  172.9 173.3 172.9 179.1   0.9

... seems that my 20% subjective estimation was quite accurate ;-)

comment:4 Changed 4 years ago by mmcgrath

  • Status changed from new to assigned
  • Owner changed from nobody to mmcgrath

This is a strange network issue, it's not impacting everyone. And for some people it is a total outage right now (like those in the westford office). I'll keep everyone posted when I hear more. At the moment we just know people are looking into it.

comment:5 Changed 4 years ago by mmcgrath

  • Component changed from General to Systems
  • Severity changed from Normal to High

comment:6 Changed 4 years ago by mmcgrath

This should be fixed now but we haven't heard a root cause yet.

comment:7 Changed 4 years ago by monnerat

Works perfectly today for me. Thanks for the action.

comment:8 Changed 4 years ago by mmcgrath

  • Resolution set to fixed
  • Status changed from assigned to closed

Now the bad news. It's fixed but they have no idea what was going on nor what fixed it. It's unlikely we're going to get a root cause for this :-/

Note: See TracTickets for help on using tickets.