Ticket #1055 (assigned enhancement)

Opened 5 years ago

Last modified 6 months ago

Fedora Search Engine

Reported by: mmcgrath Owned by: akistler
Priority: major Milestone: Fedora 18
Component: Systems Version: Production
Severity: Normal Keywords:
Cc: huzaifas, akistler, ausil, rlandmann, fche, codeblock, frankieonuonga, chrisroberts Blocked By:
Blocking: Sensitive: no

Description

So Fedora needs a search engine. Here are the requirements as I see them:

  • Crawl the websites
  • Search the websites

Preferences:

  • Python based
  • Allows programmable keywords [1]
  • Has some sort of xml or library interface so other applications can use it

[1] Allow us to have control over what pages get displayed for certain keywords

Change History

comment:1 Changed 5 years ago by mmcgrath

  • Owner changed from mmcgrath to huzaifas

comment:2 Changed 5 years ago by mmcgrath

Something I'd like to see out of appropriate candidates is how much they storage they take up. Also, no need to code this ourselves.

comment:4 Changed 5 years ago by affix

  • Owner changed from huzaifas to affix

comment:5 Changed 5 years ago by mmcgrath

  • Milestone changed from Fedora 11 to Fedora 13

comment:6 Changed 5 years ago by akistler

  • Summary changed from Fedora Search engine to Fedora Search Engine
  • Cc huzaifas, akistler added

comment:7 follow-up: ↓ 9 Changed 5 years ago by ausil

  • Cc ausil added

we need something that can search more than the wiki. it needs to index fedorahosted.org fedorapeople.org and fedoraproject.org .

there is http://www.mnogosearch.org/ http://www.dataparksearch.org/ http://crawler.archive.org/

sadly none are python, either java or c. ive not found a python one yet.

comment:8 Changed 5 years ago by akistler

  • Owner changed from affix to akistler
  • Status changed from new to assigned

There is also Perl, which is neither Java nor C. mnoGoSearch and DataparkSearch? were already on the wiki status page in Comment 6. We can add Heritrix and note that it's written in Java.

KinoSearch?, Namazu, OpenFTS, and Plucene are Perl. KinoSearch? and Namazu appear to be actively maintained. OpenFTS has a Python interface.

In the meantime, reassigning this ticket to me.

comment:9 in reply to: ↑ 7 Changed 4 years ago by rlandmann

  • Cc rlandmann added

Replying to ausil:

we need something that can search more than the wiki. it needs to index fedorahosted.org fedorapeople.org and fedoraproject.org .

It also needs to index docs.fedoraproject.org.

Publican, which generates the structure of the documentation site can incorporate a search form into the navigation menus that it maintains for each language.

comment:10 Changed 4 years ago by fche

  • Cc fche added

FWIW, over on sourceware.org / sources.redhat.com / gcc.gnu.org, we run mnogosearch against the local web sites. It works okay. I believe these servers in the same colocation facility as fedora*, so we could do a trial run without too much fuss.

comment:11 Changed 3 years ago by lmacken

What is the status with this project?

comment:12 Changed 2 years ago by kevin

  • Milestone changed from Fedora 13 to Fedora 17
  • Cc codeblock added

We now have a dev instance of dpsearch setup at: https://search-dev.fedoraproject.org/search.cgi

it's crawling docs now. Feedback welcome.

comment:13 Changed 20 months ago by puiterwijk

  • Milestone changed from Fedora 17 to Fedora 18

Has there been any progress on this since?

comment:14 Changed 20 months ago by kevin

We had a dev instance, but it got very very very slow, so we reaped it.

I'd really like to see us try again and see if we can figure out what went wrong.

comment:15 Changed 6 months ago by frankieonuonga

  • Sensitive unset

I think this really has to be revisited. I will take it up and reintroduce it on the mailing list. Too many factors have changed since then. We need to remain relevant. We will need to come up with both short term and long term goals.

comment:16 Changed 6 months ago by frankieonuonga

  • Cc frankieonuonga added

comment:17 Changed 6 months ago by chrisroberts

  • Cc croberts added

comment:18 Changed 6 months ago by chrisroberts

  • Cc chrisroberts added; croberts removed
Note: See TracTickets for help on using tickets.