#3268 [HOTFIX] MM crawler not removing stale mirror directories
Closed: Fixed None Opened 11 years ago by ertzing.


I've marked it not user_active for now. Need to figure out why the crawler isn't throwing it out though. They're not running report_mirror regularly.

Should we leave this open to track that issue? Or file a new ticket for it? or just close this one?

I don't see a 'clone' feature. :-( Leaving it open, changing description.

This is a MM crawler bug. In particular, crawler_perhost, function add_parents(). The try: clause must always be failing, otherwise parentDir would be unset when it's looked up a bit later which would cause a traceback. We aren't getting a traceback, therefore the try: clause must be failing, which causes parent directories to be set as up-to-date even if they were already discovered to not be.

In this particular case, the most recent 10 epel/5/i386/debug/repoview/* files are in fact identical between the historical files of mid-February and now, given how those files are regenerated each day. This bug causes repoview/ and all its parents to be (incorrectly) marked up-to-date.

Now to figure out how to fix the crawler logic...

in the try: clause, it was trying to do a key lookup using the directory name, not the directory object. Oops. Trying out a fix now.

This appears to be working. I'll also submit as a hotfix...

{{{
--- crawler_perhost 2010-09-06 14:46:21.000000000 +0000
+++ /home/mirrormanager/crawler_perhost 2012-05-12 01:20:54.604906708 +0000
@@ -348,21 +348,24 @@
break
return pref

-
-def add_parents(host_category_dirs, hc, d):
- splitpath = d.name.split('/')
+def parent(directory):
+ parentDir = None
+ splitpath = directory.name.split(u'/')
if len(splitpath[:-1]) > 0:
- parent = '/'.join(splitpath[:-1])
+ parentPath = u'/'.join(splitpath[:-1])
try:
- hcd = host_category_dirs[(hc, parent)]
- except KeyError:
- try:
- parentDir = Directory.byName(parent)
- host_category_dirs[(hc, parentDir)] = True
- except SQLObjectNotFound: # recursed out of the directory structure
- parentDir = None
-
- if parentDir and parentDir != hc.category.topdir: # stop at top of the category
+ parentDir = Directory.byName(parentPath)
+ except SQLObjectNotFound:
+ pass
+ return parentDir
+
+def add_parents(host_category_dirs, hc, d):
+ parentDir = parent(d)
+ if parentDir is not None:
+ if (hc, parentDir) not in host_category_dirs:
+ print "directory %s adding parent %s, unknown up2date state" % (d.name, (hc, parentDir))
+ host_category_dirs[(hc, parentDir)] = None
+ if parentDir != hc.category.topdir: # stop at top of the category
return add_parents(host_category_dirs, hc, parentDir)

 return host_category_dirs

}}}

Marking this as a hotfix so we can track it.

We have moved to mirrormanager 1.4, so this hotfix is no longer needed.

Login to comment on this ticket.

Metadata