#1848 /robots.txt on fedorahosted.org and git.fedorahosted.org
Closed: Fixed None Opened 14 years ago by mmcgrath.

Can we get a proper robots.txt deployed on fedorahosted. We're hitting several performance issues and until we can figure out a plan for how to deal with them I'd like to restrict google a bit. Specifically I want to disallow:

https://fedorahosted.org/*/browser/*

We may need to use mod_rewrite based on user agent to force this, not sure. I'm thinking about doing the same for git-web's snapshot for crawlers.


I know we are currently frozen, but this looks relatively simple, and can be applied once we are unfrozen.

I am a newb to fedora's puppet practices, so I attached a patch for review/comment/flame :)

As for git-web stuff- I notice some of the other F/LOSS git-web instances are disallowing everything with robots.txt,
See:
http://git.kernel.org/robots.txt
http://git.postgresql.org/robots.txt

Thanks

//browser/ has been robots.txt'd.

Related puppet commits:
61c33f2cfb755a2dd32262abc3aca00b8d14580c
f2ef439cd50dd3f3a26a75cf580dc9ff2cfd3ace
9af0bb7c751fb9698d9ef5708db9250d64084248

Login to comment on this ticket.

Metadata