Ticket #1531 (closed outage: fixed)

Opened 6 years ago

Last modified 5 years ago

Master Mirror Issues

Reported by: ricky Owned by: mmcgrath
Priority: critical Milestone:
Component: General Version:
Severity: The Sky Is Falling Keywords:
Cc: robert@…, chrzczonowicz@…, ebrown@…, bruno@…, nb, onekopaka, lmacken, bjornts, timn, lkundrak, ctyler@…, alexlan, mjakubicek, braden@… Blocked By:
Blocking: Sensitive:

Description

We have been having issues getting a copy of the last Fedora updates push off of the netapp. Red Hat IT was notified and they are still trying to determine the cause. They did report a steep increase on load on the master mirror machines:

http://mmcgrath.fedorapeople.org/server1-loadavg30min-2week.png

Currently, we are attempting to get a copy of the Fedora 10 and 11 updates off of the netapp to manually sync to Tier 0 mirrors, however this is occuring extremely slowly due to the above issues.

Change History

comment:1 Changed 6 years ago by robert

  • Cc robert@… added

comment:2 Changed 6 years ago by tch

  • Cc chrzczonowicz@… added

comment:3 Changed 6 years ago by ebrown

  • Cc ebrown@… added

comment:4 Changed 6 years ago by bruno

  • Cc bruno@… added

comment:5 Changed 6 years ago by ricky

Unfortunately, we've canceled the manual sync, as it was not going to finish in a reasonable amount of time. The issue is being escalated in Red Hat IT though.

Some mirror admins are currently looking into the possibility of manually copying some commonly requested updated packages to mirrors for the time being, but this would just be a temporary course of action.

comment:6 Changed 6 years ago by nb

  • Cc nb added

comment:7 Changed 6 years ago by onekopaka

  • Cc onekopaka added

comment:8 Changed 6 years ago by mmcgrath

  • Status changed from new to assigned

FWIW, tummy.com is up to date. nirik was clever enough to get the missing packages directly from our build system :)

I've encouraged the other mirrors to do a one time sync off of it until master is back up.

http://mirrors.tummy.com/pub/fedora.redhat.com/fedora/linux/updates/11/i386/

comment:9 Changed 6 years ago by lmacken

  • Cc lmacken added

comment:10 Changed 6 years ago by bjornts

  • Cc bjornts added

comment:11 Changed 6 years ago by timn

  • Cc timn added

comment:12 Changed 6 years ago by lkundrak

  • Cc lkundrak added

comment:13 Changed 6 years ago by ricky

Hi, unfortunately, we have not received any significant updates from RH IT since last night. We have resumed the manual sync, and hope that we will be able to get at least some portion of the latest updates out to some main mirrors soon.

comment:14 Changed 6 years ago by ctyler

  • Cc ctyler@… added

comment:14 Changed 6 years ago by ricky

  • Cc ctyler@… removed

We have now gotten a full copy of F11 updates off of the netapp. We are currently speaking to some main mirrors and asking them to manually sync this. This should bring them up to date for Fedora 11. The other public mirrors should eventually pick up these changes from the main mirrors as well.

We are working on similar actions for F10 updates, and eventually, if the netapp issues aren't fixed by then, F10 and F11 updates-testing.

comment:15 Changed 6 years ago by ricky

  • Cc ctyler@… added

Whoops, I accidentally removed somebody from the CC list, my mistake.

comment:16 Changed 6 years ago by alexlan

  • Cc alexlan added

comment:17 Changed 6 years ago by ricky

The latest F11 and F10 updates are both off the netapp now. F11 updates have been pushed out onto a fast Tier 0 mirror and is in the process of being synced down to other tier mirrors. The F10 updates are in the process of syncing to a faster mirror from which it can be widely distributed downstream.

comment:18 Changed 6 years ago by mjakubicek

  • Cc mjakubicek added

comment:19 Changed 6 years ago by ricky

The situation with getting updates out has improved - we are still seeing significant slowness in building filelists when syncing from the master mirrors, but some mirrors have done successful syncs against the master mirrors.

We are currently working on pushing the full set of current updates to another alternate fast location for mirrors to sync from. That sync will hopefully be complete in a few hours.

comment:20 Changed 6 years ago by ricky

Another update - we just noticed a configuration mistake that was causing all Fedora 10/11 and EPEL updates to change timestamps on all files. This could have caused large amounts of extra checksumming on the master mirrors and accounted for some of the slowness in getting updates out. We've fixed the configuration mistake now, and timestamps should be fixed in the next push. Hopefully things should start to settle down at that point.

comment:21 Changed 6 years ago by ricky

Sorry there's not much of an update today, we're still waiting for the current updates push to finish so that timestamps will be sane on the mirrors again.

comment:22 Changed 6 years ago by ricky

The updates push with the fixed timestamps has completed. These fixes are now being synced to our various mirrors.

comment:23 Changed 6 years ago by braden

  • Cc braden@… added

comment:24 Changed 6 years ago by ricky

One thing that I forgot to mention in this ticket. We originally had three netapps in different locations, but most unfortunately, two of them were down at the beginning of this issue.

Throughout all of the other stuff we've been doing, people in Red Hat have been trying to get those back up as well so that we wouldn't be at a single point of failure like this.

comment:25 Changed 6 years ago by ricky

Update: Mike did some tests with the actimeo nfs option, and achieved enormous improvements in file generation time. We are now testing these options out on the master mirror rsync servers. Hopefully it will help to improve mirror syncing abilities and allow us to bump up the connection limits further without killing these machines.

comment:26 Changed 6 years ago by ricky

The actimeo nfs option (along with noatime, nosuid, and nodev) have been added to all 4 rsync servers. Initially testing shows that filelist generation times have dropped to less than half of what we were seeing before the changes were made. Connection limits are now set at 7/server for a total of about 28 slots.

comment:27 Changed 6 years ago by ricky

One more update - connection limits are now being raised to 12 connections x 4 servers.

comment:28 Changed 6 years ago by ricky

Connection limits are now 12 connections x 5 servers. We're starting to ask mirrors if they are seeing pulls from the master go back to normal.

comment:29 Changed 6 years ago by ricky

  • Type changed from bug to outage

Things from the end user perspective seem to have returned to normal in the past few days, and users should hopefully be receiving updates normally. We will be doing some more testing before we send out an email that the issue has ended.

comment:30 Changed 6 years ago by ricky

Sorry I haven't had an update on this in a while - as you've seen, things have pretty much gone back to normal with updates in the past few weeks. Right now, we are working on getting the Internet2-connected master back up so that we won't have a single point of failure with the master mirrors.

comment:31 Changed 5 years ago by mdomsch

  • Resolution set to fixed
  • Status changed from assigned to closed

I2 mirror is back up now, and seems to be working as expected (routing to I2 and NLR working).

I'm going to close this now. Open a new ticket if problems re-occur.

Note: See TracTickets for help on using tickets.