Issue #4209: datanommer dump - fedora-infrastructure

fedora-infrastructure

#4209 datanommer dump

Closed: Fixed None Opened 10 years ago by pingou.

On a daily basis we dump the datanommer database for backup.

This the whole data in datanommer is public and accessible in datagrepper, would it be possible to make this dump available somewhere publicly?

ralph commented 10 years ago

If we did this, it would be nice to also have a symlink pointing to whatever the latest dump is so we can add "wget http://fp.o/datanommer-latest.tar.xz" to the datagrepper developer docs.

kevin commented 10 years ago

I think this is a good idea... some questions though:

where would we serve the file from? I don't want to directly allow access to db-datanommer from the net.
If we serve it from somewhere else we need to put in place some method of copying it there. ;)

Also, should we look at all our other existing databases and see if there are others that could be published in this way? tagger? fedocal?

pingou commented 10 years ago

I don't know how busy/accessible the hosted box are (is), but a subfolder in the project's folder might do it nice and simple. And I don't think we would need something else than the last version, so no need to version it by dates or so (imho)

ralph commented 10 years ago

Replying to [comment:2 kevin]:

where would we serve the file from? I don't want to directly allow access to db-datanommer from the net.

First thought: the proxies? They seem to be the place we serve most static content from. On the other hand, duplicating all of our database dumps across all of those machines may be wasteful.. especially if we're short on space at all.

Second thought: create an 'app server' just for serving the database dumps.. rather: a single machine with the dumps that all the proxies know to look to when a request comes.

If we serve it from somewhere else we need to put in place some method of copying it there. ;)

Yeah, we don't typically keep keys around that allow communication between servers (other than lockbox), do we?

Also, should we look at all our other existing databases and see if there are others that could be published in this way? tagger? fedocal?

Tagger would be a bad fit, I think. It currently keeps track of IP addresses in order to distinguish votes from anonymous users.

pingou commented 10 years ago

pkgdb and fedocal would be ok I think

Maybe we can start with these and see as we grow/have requests.

kevin commented 10 years ago

Here's an idea:

Each system does a daily (or whatever) db dump.
lockbox01 runs a cron job that pulls the db dump (It already has the ansible ssh key for this).
It places them in https://infrastructure.fedoraproject.org/db-dumps/
(or whatever we want to call the dir)
we start doing this with datanommer, pkgdb, and fedocal

Thoughts?

ralph commented 10 years ago

That sounds good to me, Kevin.

No need, I think, to keep old copies of the db dumps around. This is mostly so people can do development/tinkering so they'd only ever need the latest snapshot.

pingou commented 10 years ago

I like this to :)

It's nice and simple and easy to set in place. Should we just create an ansible role?

kevin commented 10 years ago

Well, for the cron job it needs to be in puppet, since lockbox01 is still in puppet. Should be similar to the existing ansible-playbook check/diff job. Source the ssh-agent and run commands.

For the db servers, we already have db dumps in place, but might need to adjust where they are and how named, or perhaps not. ;)

I can look at this soon, or one of you can just do it if you like to.

kevin commented 10 years ago

ok. This should be in place now for datanommer.

http://infrastructure.fedoraproject.org/infra/db-dumps/datanommer.dump.xz

We can add others as we go... :)

ralph commented 10 years ago

Pointed the dev docs at it. Thanks!

https://github.com/fedora-infra/datagrepper/commit/3e15e8d38b60e31ec9cb5ec2c1989ec45cca90c8

pingou commented 9 years ago

pkgdb2 is now available there as well \ó/