#1531 Master Mirror Issues
Closed: Fixed None Opened 14 years ago by ricky.

We have been having issues getting a copy of the last Fedora updates push off of the netapp. Red Hat IT was notified and they are still trying to determine the cause. They did report a steep increase on load on the master mirror machines:

http://mmcgrath.fedorapeople.org/server1-loadavg30min-2week.png

Currently, we are attempting to get a copy of the Fedora 10 and 11 updates off of the netapp to manually sync to Tier 0 mirrors, however this is occuring extremely slowly due to the above issues.


Unfortunately, we've canceled the manual sync, as it was not going to finish in a reasonable amount of time. The issue is being escalated in Red Hat IT though.

Some mirror admins are currently looking into the possibility of manually copying some commonly requested updated packages to mirrors for the time being, but this would just be a temporary course of action.

FWIW, tummy.com is up to date. nirik was clever enough to get the missing packages directly from our build system :)

I've encouraged the other mirrors to do a one time sync off of it until master is back up.

http://mirrors.tummy.com/pub/fedora.redhat.com/fedora/linux/updates/11/i386/

Hi, unfortunately, we have not received any significant updates from RH IT since last night. We have resumed the manual sync, and hope that we will be able to get at least some portion of the latest updates out to some main mirrors soon.

We have now gotten a full copy of F11 updates off of the netapp. We are currently speaking to some main mirrors and asking them to manually sync this. This should bring them up to date for Fedora 11. The other public mirrors should eventually pick up these changes from the main mirrors as well.

We are working on similar actions for F10 updates, and eventually, if the netapp issues aren't fixed by then, F10 and F11 updates-testing.

Whoops, I accidentally removed somebody from the CC list, my mistake.

The latest F11 and F10 updates are both off the netapp now. F11 updates have been pushed out onto a fast Tier 0 mirror and is in the process of being synced down to other tier mirrors. The F10 updates are in the process of syncing to a faster mirror from which it can be widely distributed downstream.

The situation with getting updates out has improved - we are still seeing significant slowness in building filelists when syncing from the master mirrors, but some mirrors have done successful syncs against the master mirrors.

We are currently working on pushing the full set of current updates to another alternate fast location for mirrors to sync from. That sync will hopefully be complete in a few hours.

Another update - we just noticed a configuration mistake that was causing all Fedora 10/11 and EPEL updates to change timestamps on all files. This could have caused large amounts of extra checksumming on the master mirrors and accounted for some of the slowness in getting updates out. We've fixed the configuration mistake now, and timestamps should be fixed in the next push. Hopefully things should start to settle down at that point.

Sorry there's not much of an update today, we're still waiting for the current updates push to finish so that timestamps will be sane on the mirrors again.

The updates push with the fixed timestamps has completed. These fixes are now being synced to our various mirrors.

One thing that I forgot to mention in this ticket. We originally had three netapps in different locations, but most unfortunately, two of them were down at the beginning of this issue.

Throughout all of the other stuff we've been doing, people in Red Hat have been trying to get those back up as well so that we wouldn't be at a single point of failure like this.

Update: Mike did some tests with the actimeo nfs option, and achieved enormous improvements in file generation time. We are now testing these options out on the master mirror rsync servers. Hopefully it will help to improve mirror syncing abilities and allow us to bump up the connection limits further without killing these machines.

The actimeo nfs option (along with noatime, nosuid, and nodev) have been added to all 4 rsync servers. Initially testing shows that filelist generation times have dropped to less than half of what we were seeing before the changes were made. Connection limits are now set at 7/server for a total of about 28 slots.

One more update - connection limits are now being raised to 12 connections x 4 servers.

Connection limits are now 12 connections x 5 servers. We're starting to ask mirrors if they are seeing pulls from the master go back to normal.

Things from the end user perspective seem to have returned to normal in the past few days, and users should hopefully be receiving updates normally. We will be doing some more testing before we send out an email that the issue has ended.

Sorry I haven't had an update on this in a while - as you've seen, things have pretty much gone back to normal with updates in the past few weeks. Right now, we are working on getting the Internet2-connected master back up so that we won't have a single point of failure with the master mirrors.

I2 mirror is back up now, and seems to be working as expected (routing to I2 and NLR working).

I'm going to close this now. Open a new ticket if problems re-occur.

Login to comment on this ticket.

Metadata