MirrorManager code running on app4 hitting db2 is failing; This has caused the mirrorlist and publiclist pages to stop refreshing on 5/10, and the MM admin web interface is unusable.
I killed and restarted start-mirrors on app4, which seems to have resolved it for now.
I'm going to re-open this, because it's still taking >10 minutes to render each publiclist page, when it usually takes about 10 minutes to render all the publiclist pages.
and now it's not completing any page renders again at all. :-(
{{{ mirrormanager=# select * from pgstattuple('host_category_dir'); -[ RECORD 1 ]------+---------- table_len | 733134848 tuple_count | 200905 tuple_len | 17617190 tuple_percent | 2.4 dead_tuple_count | 7408720 dead_tuple_len | 649852556 dead_tuple_percent | 88.64 free_space | 2734892 free_percent | 0.37 }}}
The hourly cron job that's vacuuming these tables has to go through three that are very large. And it's still chugging away at the first one after most of an hour.
I think that the db needs a vacuum full to trim off the 600MB of dead tuples. I'm shutting down the mirrormanager admin interface and doing that to deal with this. Holler if something besides the admin interface breaks because of this.
Alright. We're back to normal operations for now:
{{{ mirrormanager=# select * from pgstattuple('host_category_dir'); -[ RECORD 1 ]------+--------- table_len | 45891584 tuple_count | 201854 tuple_len | 17676390 tuple_percent | 38.52 dead_tuple_count | 18510 dead_tuple_len | 1523848 dead_tuple_percent | 3.32 free_space | 23678696 free_percent | 51.6 }}}
mdomsch, do you have a MirrorManager SOP somewhere? If so, this recipe should probably go in there somewhere:
Note that we're going to have to figure out what's going wrong and how we can fix this eventually. Some things to explore: * Does the database gradually get worse or does something get out of whack and then it gets worse quickly once that happens? * Test this by taking periodic pgstattuples to see if the cron job is able to keep the total table size relatively constant (ie: dead tuples may grow but the cron job should vacuum those and allocate them to free space to be reused.)
Some ideas for things that may make this better at the mirrormanager level:
Note that 1 and 2 are predicated on there being some sort of contention between the sync script's selects and the vacuums being run from cron. This has not been proven to be true so they might not help. 3 should be helpful no matter what the cause of this issue but I don't know how realistic it is or if it will cause performance problems within mirrormanager.
Log of the vacuum full run mirrormanager-full.txt
Attached the log of the vacuum full run. Next time this occurs we may want to log the output of the normal vacuum before doing the vacuum full to see if we can get more infor on why it's not clearing out the dead tuples. This time around, I just let the cron job invoke vacuum and then noted that it didn't seem to have done anything via pgstattuple afterwards.
Command that will produce logs but otherwise be like the cron job: {{{ sudo -u postgres /usr/bin/vacuumdb -v -d mirrormanager -t host-category-dir }}}
There are 3 places in the MM codepath that updated host_category_dir tuples unnecessarily: twice in the crawler, and once in report_mirror checkin. The patch below is queued to go in after the change freeze, to elimiated these extraneous updates. This should have a dramatic impact on the growth of that table's dead rows.
diff --git a/mirrors/crawler_perhost b/mirrors/crawler_perhost index a264424..c31b4b8 100755 --- a/mirrors/crawler_perhost +++ b/mirrors/crawler_perhost @@ -298,10 +298,11 @@ def sync_hcds(host, host_category_dirs):
if hcd.directory is None: hcd.directory = d
hcd.sync() +
# now-historical HostCategoryDirs are not up2date # we wait for a cascading Directory delete to delete this @@ -312,9 +313,9 @@ def sync_hcds(host, host_category_dirs): try: thcd = current_hcds[hcd] except KeyError: - hcd.lastCrawled=datetime.utcnow() - hcd.up2date=False - hcd.sync() + if hcd.up2date != False: + hcd.up2date=False + hcd.sync()
def method_pref(urls): diff --git a/mirrors/mirrors/model.py b/mirrors/mirrors/model.py index c28d0df..8c2e74a 100644 --- a/mirrors/mirrors/model.py +++ b/mirrors/mirrors/model.py @@ -241,9 +241,10 @@ class Host(SQLObject): if hcdir.count() > 0: hcdir = hcdir[0] # don't store files, we don't need it right now - hcdir.files = None - hcdir.up2date = True - hcdir.sync() + # hcdir.files = None + if hcdir.up2date != True: + hcdir.up2date = True + hcdir.sync() marked_up2date += 1 else: if len(d) > 0:
Login to comment on this ticket.