Ticket #2125 (closed outage: fixed)

Opened 4 years ago

Last modified 4 years ago

PHX2 outage

Reported by: mmcgrath Owned by: mmcgrath
Priority: blocker Milestone:
Component: Systems Version: Production
Severity: The Sky Is Falling Keywords:
Cc: Blocked By:
Blocking: Sensitive:

Description

phenomenon

PHX2 was completely unavailable. This caused all apps requiring our common storage data layer to be offline as well as the buildsystem.

reason

Power outage. Hosts came back online but network was unavailable. Most services were back online in 2 and a half hours. pkgdb took longer because bugzilla was offline. Some services, like mirrormanager, was only down for a couple of minutes while we put some temporary fixes in place to keep it online. docs, fedoraproject.org main site, start, etc all remained online. Fedorahosted.org was fine except that logging in to the trac instances was down.

recommendation

Wait for the final RFO on this. The word is still out. I think having a secondary offsite vpn would have helped keep downtime to a minimum but outbound UDP is still at question.

Change History

comment:1 Changed 4 years ago by mmcgrath

Had another outage just now, lasted about 1 hour. Network related, root cause not completely clear yet.

comment:2 Changed 4 years ago by codeblock

  • Status changed from new to closed
  • Resolution set to fixed

old

Note: See TracTickets for help on using tickets.