= bug description = Consider the following call:
$ GET 'http://mirrors.fedoraproject.org/mirrorlist?repo=epel-5&arch=i386'
http://mirror.de.leaseweb.net/epel/5/i386/ http://fedora.tu-chemnitz.de/pub/linux/fedora-epel/5/i386/ http://ftp-stud.hs-esslingen.de/pub/epel/5/i386/ http://vesta.informatik.rwth-aachen.de/ftp/pub/Linux/fedora-epel/5/i386/ http://epel.uni-oldenburg.de/5/i386/ http://mirror.fraunhofer.de/download.fedora.redhat.com/epel/5/i386/ http://fedora.kiewel-online.ch/epel/5/i386/ http://ftp.uni-koeln.de/mirrors/fedora/epel/5/i386/
http://mirror.de.leaseweb.net/epel/5/i386/ is an outdated mirror (last updated February 19). It should be removed from the list.
= bug analysis =
= fix recommendation =
Sorry about the mess.
{{{ $ GET 'http://mirrors.fedoraproject.org/mirrorlist?repo=epel-5&arch=i386'
http://mirror.de.leaseweb.net/epel/5/i386/ http://fedora.tu-chemnitz.de/pub/linux/fedora-epel/5/i386/ http://ftp-stud.hs-esslingen.de/pub/epel/5/i386/ http://vesta.informatik.rwth-aachen.de/ftp/pub/Linux/fedora-epel/5/i386/ http://epel.uni-oldenburg.de/5/i386/ http://mirror.fraunhofer.de/download.fedora.redhat.com/epel/5/i386/ http://fedora.kiewel-online.ch/epel/5/i386/ http://ftp.uni-koeln.de/mirrors/fedora/epel/5/i386/ }}}
I've marked it not user_active for now. Need to figure out why the crawler isn't throwing it out though. They're not running report_mirror regularly.
Should we leave this open to track that issue? Or file a new ticket for it? or just close this one?
I don't see a 'clone' feature. :-( Leaving it open, changing description.
This is a MM crawler bug. In particular, crawler_perhost, function add_parents(). The try: clause must always be failing, otherwise parentDir would be unset when it's looked up a bit later which would cause a traceback. We aren't getting a traceback, therefore the try: clause must be failing, which causes parent directories to be set as up-to-date even if they were already discovered to not be.
In this particular case, the most recent 10 epel/5/i386/debug/repoview/* files are in fact identical between the historical files of mid-February and now, given how those files are regenerated each day. This bug causes repoview/ and all its parents to be (incorrectly) marked up-to-date.
Now to figure out how to fix the crawler logic...
in the try: clause, it was trying to do a key lookup using the directory name, not the directory object. Oops. Trying out a fix now.
This appears to be working. I'll also submit as a hotfix...
{{{ --- crawler_perhost 2010-09-06 14:46:21.000000000 +0000 +++ /home/mirrormanager/crawler_perhost 2012-05-12 01:20:54.604906708 +0000 @@ -348,21 +348,24 @@ break return pref
- -def add_parents(host_category_dirs, hc, d): - splitpath = d.name.split('/') +def parent(directory): + parentDir = None + splitpath = directory.name.split(u'/') if len(splitpath[:-1]) > 0: - parent = '/'.join(splitpath[:-1]) + parentPath = u'/'.join(splitpath[:-1]) try: - hcd = host_category_dirs[(hc, parent)] - except KeyError: - try: - parentDir = Directory.byName(parent) - host_category_dirs[(hc, parentDir)] = True - except SQLObjectNotFound: # recursed out of the directory structure - parentDir = None - - if parentDir and parentDir != hc.category.topdir: # stop at top of the category + parentDir = Directory.byName(parentPath) + except SQLObjectNotFound: + pass + return parentDir + +def add_parents(host_category_dirs, hc, d): + parentDir = parent(d) + if parentDir is not None: + if (hc, parentDir) not in host_category_dirs: + print "directory %s adding parent %s, unknown up2date state" % (d.name, (hc, parentDir)) + host_category_dirs[(hc, parentDir)] = None + if parentDir != hc.category.topdir: # stop at top of the category return add_parents(host_category_dirs, hc, parentDir)
return host_category_dirs
}}}
Marking this as a hotfix so we can track it.
We have moved to mirrormanager 1.4, so this hotfix is no longer needed.
Login to comment on this ticket.