Ticket #1845 (closed task: fixed)

Opened 4 years ago

Last modified 4 years ago

The Move

Reported by: mmcgrath Owned by: mmcgrath
Priority: blocker Milestone: Fedora 13
Component: Systems Version: Production
Severity: The Sky Is Falling Keywords:
Cc: jcollie, nb, sparks, poelstra, rrix, mbooth, ianweller, amessina, mmahut, nigelj, ebrown, laxathom, petersen, susmit, toshio, mtasaka, mdomsch, jkratoch, hadess Blocked By:
Blocking: Sensitive:

Description (last modified by mmcgrath) (diff)

What

The Fedora Project is moving out of our primary PHX datacenter into a new center. This will involve moving several machines, disk trays, etc, on to a truck, traveling a couple of cities over, and unloading and re-racking them.

The problems

PHX is setup as Fedora's central hub. It contains our primary data layer and build system. These systems will be completely unavailable for multiple days while we move things.

Timing

Starting on December 12th we will start powering hosts down. We're expecting to bring services back online starting on the 13th but may not be completely done until December 15th.

Services that will be down

Services that will remain up

Services still in question

  • zodbot (our IRC bot)

Oddities

While some services above will be listed as available, those services may have stale data in them. Mirrormanager for example will stop testing mirrors for readiness meaning more stale mirrors will show up then normal.

Change History

comment:1 Changed 4 years ago by mmcgrath

  • Owner changed from lmacken to mmcgrath
  • Status changed from new to assigned
  • Description modified (diff)

comment:2 Changed 4 years ago by mmcgrath

  • Description modified (diff)

comment:3 Changed 4 years ago by mmcgrath

  • Component changed from Security to Systems

comment:4 Changed 4 years ago by mmcgrath

  • Description modified (diff)

comment:5 Changed 4 years ago by mmcgrath

  • Description modified (diff)

comment:6 Changed 4 years ago by jcollie

  • Cc jcollie added

comment:7 Changed 4 years ago by sparks

  • Cc sparks added

comment:8 Changed 4 years ago by mmcgrath

  • Description modified (diff)

comment:9 Changed 4 years ago by mmcgrath

  • Description modified (diff)

comment:10 Changed 4 years ago by poelstra

  • Cc poelstra added

comment:11 Changed 4 years ago by rrix

  • Cc rrix added

comment:12 Changed 4 years ago by mbooth

  • Cc mbooth added

comment:13 Changed 4 years ago by ianweller

  • Cc ianweller added

comment:14 Changed 4 years ago by amessina

  • Cc amessina added

comment:15 Changed 4 years ago by mmahut

  • Cc mmahut added

comment:16 Changed 4 years ago by cwickert

Please make sure that the voting system remains online the whole time because the move is scheduled during the election period of FESCo, FAMSCo and the board (December 8-15). If the voting system is not up, we need to reschedule the elections.

comment:17 Changed 4 years ago by pfrields

  • Cc nigelj added

I met with Mike McGrath?, Nigel Jones, and John Rose about the election impact. We are going to open up elections earlier than planned, on December 5. In addition, if there is an unexpected, substantial outage of 8 hours or more, we will extend the end of elections by a day. For any additional day of outage, we will extend the end of elections by an additional day as well.

In addition, there will be a function available in the voting application that will allow a user to verify their already-recorded vote. While it's unlikely that an outage will occur, and it's also unlikely any outage would ruin a recorded vote, this will add a measure of confidence and security without causing us to run too far past our "30 days after release" guideline unnecessarily.

comment:18 Changed 4 years ago by ebrown

  • Cc ebrown added

comment:19 Changed 4 years ago by laxathom

  • Cc laxathom added

comment:20 Changed 4 years ago by sparks

Mike, Do you know approximately what time the outage will start? I'd like to give the Docs folks a drop-dead time to push any update to the CVS so those documents will be updated on docs.fp.o.

Thanks!

comment:21 Changed 4 years ago by petersen

  • Cc petersen added

comment:22 Changed 4 years ago by mmcgrath

We got network connectivity today in PHX2, Nigel is working on getting the db servers up, I've got the new bastion up (for mail and vpn).

Tomorrow we'll be scheduling an outage to do the cutover for the databases. We'll also be using this time to start using the new VPN. I'm sending out email notifications now.

comment:23 Changed 4 years ago by susmit

  • Cc susmit added

comment:24 Changed 4 years ago by toshio

  • Cc toshio added

comment:25 Changed 4 years ago by mtasaka

  • Cc mtasaka added

comment:26 Changed 4 years ago by mmcgrath

  • Description modified (diff)

I just moved some services into the known up and known down category. Mail's the last big one left unknown, it's being worked on but is not complete at this time.

comment:27 Changed 4 years ago by skvidal

email is flowing now on bastion3. I setup a new transport_maps in our main.cf that only has 2 entries. any mail bound for @*.redhat.com or *@redhat.com gets passed to ext-mx.corp.redhat.com

The rest of the mail is delivered directly. This is an improvement over our former situation since mail going to non-redhat.com addresses won't have to wait for an extra hop or 2 inside @redhat.com for delivery.

Mail delivery will be switched over soon, I'll put the transport_maps change in puppet so our system has the change as well.

comment:28 Changed 4 years ago by skvidal

and the internal hops are over to bastion3, now. postfix needs to continue to be up on bastion2 for a while until we migrate everything over to phx2

comment:29 Changed 4 years ago by mmcgrath

  • Description modified (diff)

comment:30 Changed 4 years ago by nb

  • Cc nb added

comment:31 Changed 4 years ago by mdomsch

  • Cc mdomsch added

comment:32 Changed 4 years ago by jkratoch

  • Cc jkratoch added

comment:33 Changed 4 years ago by hadess

  • Cc hadess added

comment:34 Changed 4 years ago by mmcgrath

  • Status changed from assigned to closed
  • Resolution set to fixed

This is technically done. See https://fedorahosted.org/fedora-infrastructure/ticket/1884 for more information about tomorrows outage.

Note: See TracTickets for help on using tickets.