Ticket #66 (closed defect: fixed)

Opened 9 years ago

Last modified 9 years ago

Builder auto-reconnect

Reported by: mmcgrath Owned by: mikem
Priority: minor Milestone:
Component: client Version: 1.2.2
Keywords: Cc: jkeating
Blocked By: Blocking:


When the builders lose their connection to koji, they should try to auto-reconnect.

Change History

comment:1 Changed 9 years ago by mikem

  • Owner changed from mikeb to mikem

The mechanism is there already. Can you give me some examples of situations where they don't reconnect? (Tracebacks please if possible)

comment:2 Changed 9 years ago by mmcgrath

  • Cc jkeating added

Just two nights ago we had the koji hub crash, we rebooted it. I went back to bed, woke up in the morning and the builders needed to be restarted.

comment:3 Changed 9 years ago by mikem

  • Status changed from new to assigned

It seems the problem is (at least partly) that the defaults are bad for this use case. The ClientSession? code by default retries 30 times in 20 second intervals. This is about right for recovering from network glitches and small outages. If the hub is down for 10 minutes, though, this default is exhausted.

What I can do is:

  • add some config options to kojid to allow changing these session parameters
  • change the retry behavior for outages (i.e. on 'Connection refused'). Perhaps that situation merits special treatment.

Further along, we can look at other ways to protect kojid from these sorts of issues.

I should point out that, internally, we have a cfengine setup that automatically restarts the build daemon if it dies. Perhaps you can do something similar as an additional measure of protection.

comment:4 Changed 9 years ago by mmcgrath

<nod> at times of high load on the database it can take 5 minutes or more to just get a shell. Work on the db, seeing what queries are going on, etc. Can often take much longer.

comment:5 Changed 9 years ago by mikem

commit 54f79ff665fd4147b889b1e18e5846de3476b4e4 Author: Mike McLean? <mikem@…> Date: Fri Feb 22 11:32:27 2008 -0500

make ClientSession? retries more configurable, and more robust add an offline mode to the hub (ServerOffline? fault) report offline status if db connection fails adjust retry timings for kojid and kojira

comment:6 Changed 9 years ago by mikem

  • Status changed from assigned to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.