#4209 datanommer dump
Closed: Fixed None Opened 10 years ago by pingou.

On a daily basis we dump the datanommer database for backup.

This the whole data in datanommer is public and accessible in datagrepper, would it be possible to make this dump available somewhere publicly?


If we did this, it would be nice to also have a symlink pointing to whatever the latest dump is so we can add "wget http://fp.o/datanommer-latest.tar.xz" to the datagrepper developer docs.

I think this is a good idea... some questions though:

  • where would we serve the file from? I don't want to directly allow access to db-datanommer from the net.

  • If we serve it from somewhere else we need to put in place some method of copying it there. ;)

Also, should we look at all our other existing databases and see if there are others that could be published in this way? tagger? fedocal?

I don't know how busy/accessible the hosted box are (is), but a subfolder in the project's folder might do it nice and simple. And I don't think we would need something else than the last version, so no need to version it by dates or so (imho)

Replying to [comment:2 kevin]:

  • where would we serve the file from? I don't want to directly allow access to db-datanommer from the net.

First thought: the proxies? They seem to be the place we serve most static content from. On the other hand, duplicating all of our database dumps across all of those machines may be wasteful.. especially if we're short on space at all.

Second thought: create an 'app server' just for serving the database dumps.. rather: a single machine with the dumps that all the proxies know to look to when a request comes.

  • If we serve it from somewhere else we need to put in place some method of copying it there. ;)

Yeah, we don't typically keep keys around that allow communication between servers (other than lockbox), do we?

Also, should we look at all our other existing databases and see if there are others that could be published in this way? tagger? fedocal?

Tagger would be a bad fit, I think. It currently keeps track of IP addresses in order to distinguish votes from anonymous users.

pkgdb and fedocal would be ok I think

Maybe we can start with these and see as we grow/have requests.

Here's an idea:

  • Each system does a daily (or whatever) db dump.

  • lockbox01 runs a cron job that pulls the db dump (It already has the ansible ssh key for this).

  • It places them in https://infrastructure.fedoraproject.org/db-dumps/
    (or whatever we want to call the dir)

  • we start doing this with datanommer, pkgdb, and fedocal

Thoughts?

That sounds good to me, Kevin.

No need, I think, to keep old copies of the db dumps around. This is mostly so people can do development/tinkering so they'd only ever need the latest snapshot.

I like this to :)

It's nice and simple and easy to set in place. Should we just create an ansible role?

Well, for the cron job it needs to be in puppet, since lockbox01 is still in puppet. Should be similar to the existing ansible-playbook check/diff job. Source the ssh-agent and run commands.

For the db servers, we already have db dumps in place, but might need to adjust where they are and how named, or perhaps not. ;)

I can look at this soon, or one of you can just do it if you like to.

ok. This should be in place now for datanommer.

http://infrastructure.fedoraproject.org/infra/db-dumps/datanommer.dump.xz

We can add others as we go... :)

pkgdb2 is now available there as well \รณ/

Login to comment on this ticket.

Metadata