Can we get a proper robots.txt deployed on fedorahosted. We're hitting several performance issues and until we can figure out a plan for how to deal with them I'd like to restrict google a bit. Specifically I want to disallow:
https://fedorahosted.org/*/browser/*
We may need to use mod_rewrite based on user agent to force this, not sure. I'm thinking about doing the same for git-web's snapshot for crawlers.
attachment fh.o-robots.txt.patch
I know we are currently frozen, but this looks relatively simple, and can be applied once we are unfrozen.
I am a newb to fedora's puppet practices, so I attached a patch for review/comment/flame :)
As for git-web stuff- I notice some of the other F/LOSS git-web instances are disallowing everything with robots.txt, See: http://git.kernel.org/robots.txt http://git.postgresql.org/robots.txt
Thanks
//browser/ has been robots.txt'd.
Related puppet commits: 61c33f2cfb755a2dd32262abc3aca00b8d14580c f2ef439cd50dd3f3a26a75cf580dc9ff2cfd3ace 9af0bb7c751fb9698d9ef5708db9250d64084248
Login to comment on this ticket.