#2718 investigate replication of database resources
Closed: Fixed 7 years ago Opened 13 years ago by kevin.

= phenomenon =

All our db resources are in single instances, leading to a possible single point of failure

We currently have (at least):

db01 - mysql:
cacti
mediawiki
insight
prelude (inactive)
zarafa (mostly inactive)
transifex (inactive)

db02 - postgresql
fedora account system

db03 - postgresql
koji/buildsystem


I need details of the database hosts, and login credentials in order to investigate on these three dbs

for mysql I am using mysql internal replication and works reasonable.

for postgresql and I using bucardo and I have packaged it for fedora.

bucardo requires a cron job [1] or it will wast some disk space and becomes slow after sometime.

I have packaged bucardo for fedora, also there are a new version of bucardo, 4.4.2, the packaged version is still 4.4.0

[1] - http://bucardo.org/wiki/Bucardo/Cron

ok, what sort of info do you need on the hosts?
mem/disk space used by db? anything else?

We typically keep db hosts access pretty restricted. This is because our account system is on them so anyone with db access basically can then leverage that to access anything at all. ;(

kevin, asks infra for 2 or 3 temporary virtual machines and we will setup postgresql replication on it in a testing environment after testing done it can be replicated to the real world.

additionally:

db01 smolt

db02: pkgdb bodhi mirrormanager

We could test this on 2-3 virtual machines. :)

I was thinking we could try and get an idea of our options and how hard they would be first, then look at testing.

Other questions we need to address:

a) Do we have host resources to host the replicants in a non phx2 site?

b) What would be the plan/actions needed if the primary failed? Can we set it so things automatically failover? Or is manual intervention required?

Hi kevic,
im not sure with postgres (let me read on that a bit), but with mysql we can have
1) Replication: this will be asynchronous, hosts need not be in a LAN, fail over will be manual
2) MySQL ClusterL asynchronous, hosts has to be connected via LAN, but fail over is automatic.

you can decide which one we should implement, and get a virtual machine and start with any one mysql db. It should not take more than an hour to setup the replication. Although to sync the entire table can take some time (which we can do even without locking the master)

ok, we can use publictest02.fedoraproject.org and publictest10.fedoraproject.org for testing ideas here. Note that these are also being used for glusterfs testing, but hopefully that won't be too big a problem. We can see about getting you access to those (from sysadmin-test).

Some additional questions/thoughts here:

  • Currently our db hosts are all rhel5. We may want to look at setting this up after they move over to rhel6. Are there any advantages in software versions there? I assume both ends must run the same version?

  • It probibly doesn't make much sense to replicate db03, since it's all buildsystem, and we don't have another site with buildsys setup, so we should concentrate on db01/db02 items.

  • From the testing on the above systems, we should ideally produce a db replication SOP wiki page with how to set things up, and how to switch over to replicant, etc.

rdeys: I know you are out of town for a while, please do ping me when you get back and we can continue moving on this. :) thanks!

For MySQL we can implement an active/passive HA with MySQL master-slave replication and HAproxy thought I've never tested this in a production environment (but I tested it year ago or so).

Started writing out our list of hard requirements and soft requirements: https://fedoraproject.org/wiki/Infrastructure/DB_Replication_Plan

Note that at least some of the mentioned MySQL technologies mentioned earlier in this ticket do not meet the 0 data loss requirement.

So, I have been playing with postgresql BDR and have it now working in staging.

koji.stg.fedoraproject.org is backed by db-koji01.stg and db-koji02.stg that are using postgres bdr (bidirectional replication).

I've also made pgbdr01.stg/pgbdr02.stg and migrated (so far) fedocal to it. It seems to be working great.

It more than meets our needs. I am going to be moving this forward and rolling into production likely after Fedora 25 is out.

@kevin changed the status to Closed

7 years ago

Login to comment on this ticket.

Metadata