Ticket #2718 (assigned task)

Opened 4 years ago

Last modified 2 years ago

investigate replication of database resources

Reported by: kevin Owned by: rdeys
Priority: major Milestone: Fedora 15
Component: General Version:
Severity: Normal Keywords:
Cc: athmane Blocked By:
Blocking: Sensitive: no

Description

phenomenon

All our db resources are in single instances, leading to a possible single point of failure

We currently have (at least):

db01 - mysql: cacti mediawiki insight prelude (inactive) zarafa (mostly inactive) transifex (inactive)

db02 - postgresql fedora account system db03 - postgresql koji/buildsystem

Change History

comment:1 Changed 4 years ago by rdeys

  • Owner changed from sysadmin-members to rdeys
  • Status changed from new to assigned

comment:2 Changed 4 years ago by rdeys

I need details of the database hosts, and login credentials in order to investigate on these three dbs

comment:3 Changed 4 years ago by itamarjp

for mysql I am using mysql internal replication and works reasonable.

for postgresql and I using bucardo and I have packaged it for fedora.

bucardo requires a cron job [1] or it will wast some disk space and becomes slow after sometime.

I have packaged bucardo for fedora, also there are a new version of bucardo, 4.4.2, the packaged version is still 4.4.0

[1] - http://bucardo.org/wiki/Bucardo/Cron

comment:4 Changed 4 years ago by kevin

ok, what sort of info do you need on the hosts? mem/disk space used by db? anything else?

We typically keep db hosts access pretty restricted. This is because our account system is on them so anyone with db access basically can then leverage that to access anything at all. ;(

comment:5 Changed 4 years ago by itamarjp

kevin, asks infra for 2 or 3 temporary virtual machines and we will setup postgresql replication on it in a testing environment after testing done it can be replicated to the real world.

comment:6 Changed 4 years ago by toshio

additionally:

db01 smolt

db02: pkgdb bodhi mirrormanager

comment:7 Changed 4 years ago by kevin

We could test this on 2-3 virtual machines. :)

I was thinking we could try and get an idea of our options and how hard they would be first, then look at testing.

Other questions we need to address:

a) Do we have host resources to host the replicants in a non phx2 site?

b) What would be the plan/actions needed if the primary failed? Can we set it so things automatically failover? Or is manual intervention required?

comment:8 Changed 4 years ago by rdeys

Hi kevic,

im not sure with postgres (let me read on that a bit), but with mysql we can have 1) Replication: this will be asynchronous, hosts need not be in a LAN, fail over will be manual 2) MySQL ClusterL asynchronous, hosts has to be connected via LAN, but fail over is automatic.

you can decide which one we should implement, and get a virtual machine and start with any one mysql db. It should not take more than an hour to setup the replication. Although to sync the entire table can take some time (which we can do even without locking the master)

comment:9 Changed 4 years ago by kevin

ok, we can use publictest02.fedoraproject.org and publictest10.fedoraproject.org for testing ideas here. Note that these are also being used for glusterfs testing, but hopefully that won't be too big a problem. We can see about getting you access to those (from sysadmin-test).

Some additional questions/thoughts here:

  • Currently our db hosts are all rhel5. We may want to look at setting this up after they move over to rhel6. Are there any advantages in software versions there? I assume both ends must run the same version?
  • It probibly doesn't make much sense to replicate db03, since it's all buildsystem, and we don't have another site with buildsys setup, so we should concentrate on db01/db02 items.
  • From the testing on the above systems, we should ideally produce a db replication SOP wiki page with how to set things up, and how to switch over to replicant, etc.

comment:10 Changed 4 years ago by kevin

rdeys: I know you are out of town for a while, please do ping me when you get back and we can continue moving on this. :) thanks!

comment:11 Changed 4 years ago by athmane

  • Cc athmane added

For MySQL we can implement an active/passive HA with MySQL master-slave replication and HAproxy thought I've never tested this in a production environment (but I tested it year ago or so).

comment:12 Changed 2 years ago by toshio

  • Sensitive unset

Started writing out our list of hard requirements and soft requirements: https://fedoraproject.org/wiki/Infrastructure/DB_Replication_Plan

Note that at least some of the mentioned MySQL technologies mentioned earlier in this ticket do not meet the 0 data loss requirement.

Note: See TracTickets for help on using tickets.