Ticket #160 (closed enhancement: fixed)

Opened 6 years ago

Last modified 5 years ago

Generate deltarpms on bodhi queue updates?

Reported by: jdieter Owned by: lmacken
Priority: major Milestone:
Component: bodhi-server Version:
Keywords: Cc: wtogami, jkeating, mikem, sadmac, skvidal
Blocked By: Blocking:

Description

We've been trying to work out how to integrate deltarpms into the Fedora buildsystem, and we're now down to two real options: either generate the deltarpms every time a package is built in koji or generate the deltarpms when bodhi queues a package for updates.

The pros and cons of each are at http://fedoraproject.org/wiki/Infrastructure/PrestoBuildsysIntegration, but we were wondering what it would take for us to generate deltarpms whenever bodhi queues a package for updates.

I would be quite happy to do the implementation, but any thoughts on how it should integrate (if at all) with bodhi would be much appreciated.

Change History

comment:1 Changed 6 years ago by lmacken

  • Status changed from new to assigned

Hey Jonathan,

So, a few questions:

  • what data is needed to produce these deltas?
  • where are we going to store them?
  • are we going to push them out to the same mirrors as our updates?
  • do we want to create deltas at the time of composing the updates repo, or asynchronously when updates are queued for pushing?

comment:2 Changed 6 years ago by jdieter

Thanks for taking a look at this. To answer the questions:

  • To create a single deltarpm, we need the previous rpm and the current rpm. If we want to create more deltarpms, we can use the old deltarpms for the package.
  • I'm not sure where to store them. This is where I was hoping for some advice. Does bodhi actually store any rpms or does koji do the storing?
  • We are planning on pushing the deltarpms to the same mirrors as our updates.
  • The deltarpms can be generated either at compose time or asynchronously, but I'd probably suggest asynchronously so we're not waiting for a bunch of deltarpms to be built. On my server that's creating deltarpms at the moment, it can take up to two hours to generate all the deltarpms for a Rawhide update.

I hope this gives you a clearer picture of what the problem space is.

comment:3 follow-up: ↓ 5 Changed 6 years ago by lmacken

Bodhi uses mash to composes repos to our koji storage drive. We've got a lot more spacing coming soon, but for now we're almost at capacity. Once we get the new storage in place, we could probably store the deltas alongside of our updates repos. Forgive my ignorance with regard to deltarpms, but will these be in a completely separate repository? or do they get stuffed into the existing repo?

Async is probably the way to go. So, what we're looking at is tapping into bodhi's save() method (bodhi/controllers.py:RootController.save). At some point in there, after the error checking, we should probably kick off a thread to generate the deltas. With the new PackageUpdate? object, you can call get_latest() on every PackageBuild? associated with it (in PackageUpdate?.builds) to get the path to the SRPM for the last released update of that package. PackageBuild?.get_source_path() will give you the path to the RPMs for the incoming update. From there, you should have everything you need to produce the delta, I think?

comment:4 Changed 6 years ago by lmacken

  • Cc wtogami added

comment:5 in reply to: ↑ 3 Changed 6 years ago by jdieter

Sorry for the delay. I've been setting up a koji server here along with bodhi to try to work out how it's all going to fit together.

Replying to lmacken:

Bodhi uses mash to composes repos to our koji storage drive. We've got a lot more spacing coming soon, but for now we're almost at capacity. Once we get the new storage in place, we could probably store the deltas alongside of our updates repos. Forgive my ignorance with regard to deltarpms, but will these be in a completely separate repository? or do they get stuffed into the existing repo?

They get stuffed into the existing repo with an xml file that gets added to repomd.xml using modifyrepo. Size-wise, figure that you'll generate roughly 10% of the space used by rpms in deltarpms.

Async is probably the way to go. So, what we're looking at is tapping into bodhi's save() method (bodhi/controllers.py:RootController.save). At some point in there, after the error checking, we should probably kick off a thread to generate the deltas. With the new PackageUpdate? object, you can call get_latest() on every PackageBuild? associated with it (in PackageUpdate?.builds) to get the path to the SRPM for the last released update of that package. PackageBuild?.get_source_path() will give you the path to the RPMs for the incoming update. From there, you should have everything you need to produce the delta, I think?

This looks good to me. Let me see what I can come up with.

comment:6 follow-up: ↓ 8 Changed 6 years ago by jdieter

I wasn't able to get my local instance of bodhi-server to interact properly with my local test koji server (most likely my fault, but that's for another day), so I'm afraid I can't test things properly. A couple of questions, then:

  • Where should I put the deltarpms? Would there be a problem with storing the deltarpms in /mnt/koji/packages/foo/1/1.fc9/[i386|ppc|x86_64]/DRPMS? If I can do this, it will really simplify the code necessary to grab old deltarpms without creating duplicates of rpms.
  • If we store deltarpms in the above location, will koji do the garbage collecting, or will we have to come up with an alternate method of garbage-collecting?

Finally, are there any test servers available that I can use to test this? It's incredibly difficult to code without being able to test what I'm working on.

comment:7 follow-up: ↓ 9 Changed 6 years ago by jdieter

I'm planning on splitting this into two parts:

  • a daemon that takes xml-rpc requests to build deltarpms from a certain old rpm file to a certain new rpm file (and will take a different xml-rpc request to add a .sig file to deltarpms).
  • a patch to bodhi that will work out the files for the latest rpm and the new rpm and then send the xml-rpc request with said files to daemon above.

Any thoughts or comments? Will this work or is there something I'm missing?

comment:8 in reply to: ↑ 6 ; follow-up: ↓ 10 Changed 6 years ago by lmacken

Replying to jdieter:

I wasn't able to get my local instance of bodhi-server to interact properly with my local test koji server (most likely my fault, but that's for another day), so I'm afraid I can't test things properly.

Feel free to file tickets if you encounter more trouble getting your own bodhi instance to work with koji. This isn't something I've thoroughly tested, but I would like to make sure it is fairly intuitive.

A couple of questions, then:

  • Where should I put the deltarpms? Would there be a problem with storing the deltarpms in /mnt/koji/packages/foo/1/1.fc9/[i386|ppc|x86_64]/DRPMS? If I can do this, it will really simplify the code necessary to grab old deltarpms without creating duplicates of rpms.

That's fine with me, but that is Koji Land, so we should probably make sure that this is OK with those guys beforehand.

  • If we store deltarpms in the above location, will koji do the garbage collecting, or will we have to come up with an alternate method of garbage-collecting?

Hmm, I'm not quite sure which would be best. Since /mnt/koji/packages is Koji's territory, it would make sense that it could garbage collect that stuff. On the other hand, bodhi would be the one creating the deltas there, so maybe it should clean up after itself? Again, probably something to bring up with the Koji guys.

Finally, are there any test servers available that I can use to test this? It's incredibly difficult to code without being able to test what I'm working on.

You can try testing it on publictest2, which has read-only /mnt/koji access. Note that this guest will be disappearing in the near future, so make sure to backup your code to be safe.

comment:9 in reply to: ↑ 7 Changed 6 years ago by lmacken

Replying to jdieter:

I'm planning on splitting this into two parts:

  • a daemon that takes xml-rpc requests to build deltarpms from a certain old rpm file to a certain new rpm file (and will take a different xml-rpc request to add a .sig file to deltarpms).
  • a patch to bodhi that will work out the files for the latest rpm and the new rpm and then send the xml-rpc request with said files to daemon above.

Any thoughts or comments? Will this work or is there something I'm missing?

Sounds good. Having a separate daemon will give us the ability to host it alongside of bodhi, or even on a different machine. I can't think of anything that would prevent this model from working.

comment:10 in reply to: ↑ 8 ; follow-up: ↓ 11 Changed 6 years ago by jdieter

Replying to lmacken:

Replying to jdieter:

I wasn't able to get my local instance of bodhi-server to interact properly with my local test koji server (most likely my fault, but that's for another day), so I'm afraid I can't test things properly.

Feel free to file tickets if you encounter more trouble getting your own bodhi instance to work with koji. This isn't something I've thoroughly tested, but I would like to make sure it is fairly intuitive.

I got past the first problem I was hitting, but the later ones are related to not having a local instance of pkgdb. Anyhow, I've at least got it to the point where I can request that an update be pushed to stable, which is all I need for what I'm testing.

A couple of questions, then:

  • Where should I put the deltarpms? Would there be a problem with storing the deltarpms in /mnt/koji/packages/foo/1/1.fc9/[i386|ppc|x86_64]/DRPMS? If I can do this, it will really simplify the code necessary to grab old deltarpms without creating duplicates of rpms.

That's fine with me, but that is Koji Land, so we should probably make sure that this is OK with those guys beforehand.

Do you mind looking into this? I'm not even sure who to ask. If you do want me to do the asking, do you mind directing me to the right person?

  • If we store deltarpms in the above location, will koji do the garbage collecting, or will we have to come up with an alternate method of garbage-collecting?

Hmm, I'm not quite sure which would be best. Since /mnt/koji/packages is Koji's territory, it would make sense that it could garbage collect that stuff. On the other hand, bodhi would be the one creating the deltas there, so maybe it should clean up after itself? Again, probably something to bring up with the Koji guys.

My logic is based on the assumption that Koji just "rm -Rf /mnt/koji/package/foo/1/1.fc9" when it's garbage collecting. If that's correct, it will take care of the deltarpms quite gracefully, and we won't have to keep track of all the deltarpms that we've ever made. We only *need* the deltarpms to be kept until just after the next package update (when we've generated the new deltarpms).

Finally, are there any test servers available that I can use to test this? It's incredibly difficult to code without being able to test what I'm working on.

You can try testing it on publictest2, which has read-only /mnt/koji access. Note that this guest will be disappearing in the near future, so make sure to backup your code to be safe.

Thanks, I'll give this a go once I get a bit further.

comment:11 in reply to: ↑ 10 Changed 6 years ago by lmacken

Replying to jdieter:

Do you mind looking into this? I'm not even sure who to ask. If you do want me to do the asking, do you mind directing me to the right person?

Sure, I'll talk to our koji guys.

comment:12 follow-up: ↓ 13 Changed 6 years ago by lmacken

I spoke with jkeating, and he thinks we should store the deltas in /mnt/koji/packages/foo/1/1.fc9/data/deltas, or something of the sort. Garbage collection will require a little bit of code to wipe the deltas, but is definitely feasible.

comment:13 in reply to: ↑ 12 ; follow-up: ↓ 14 Changed 6 years ago by jdieter

Replying to lmacken:

I spoke with jkeating, and he thinks we should store the deltas in /mnt/koji/packages/foo/1/1.fc9/data/deltas, or something of the sort. Garbage collection will require a little bit of code to wipe the deltas, but is definitely feasible.

Okay, a couple of questions on the details.

  • There will be separate deltarpms for each arch, so would it work to save the deltarpms in a subfolder of /mnt/koji/packages/foo/1/1.fc9/i386?
  • Sometimes there's more than one rpm for each source rpm (i.e. bzip2.rpm, bzip2-libs.rpm for bzip2.src.rpm). Can we save the deltarpms in a directory named something like /mnt/koji/packages/foo/1/1.fc9/i386/foo.drpm/ (which would allow foo-lib's deltarpms to be stored in /mnt/koji/packages/foo/1/1.fc9/i386/foo-lib.drpm/)?

If those spots are reserved, it won't be a problem at all to save in /mnt/koji/packages/foo/1/1.fc9/data/deltas, but I thought I'd ask (as the directory structure is already mostly there).

comment:14 in reply to: ↑ 13 Changed 6 years ago by lmacken

  • Cc jkeating added

CCing jkeating, who can probably provide some insight onto jdieter's questions.

comment:15 follow-up: ↓ 18 Changed 6 years ago by jdieter

Just a ping on #13 plus a question about signing. Does bodhi attach the signatures to the rpms that it pushes out? If not, what actually attaches the signatures? And are the signatures passed as a string or a filename? Just trying to implement an xml-rpc call in the prestod daemon to attach signatures.

comment:16 follow-up: ↓ 17 Changed 6 years ago by mikem

  • Cc mikem added

As far as the path, I would strongly prefer that it be kept out of /mnt/koji. The design of koji is based on the assumption that only koji has write access to this store. I realize that given our mount structure, you might be forced to use /mnt/koji. If that is that case, please leave the established koji dirs alone. Perhaps /mnt/koji/deltas...

Not doing this in Koji means not doing this in Koji. See https://fedorahosted.org/koji/ticket/38 for my previous comments on the subject.

comment:17 in reply to: ↑ 16 Changed 6 years ago by jdieter

Replying to mikem:

As far as the path, I would strongly prefer that it be kept out of /mnt/koji. The design of koji is based on the assumption that only koji has write access to this store. I realize that given our mount structure, you might be forced to use /mnt/koji. If that is that case, please leave the established koji dirs alone. Perhaps /mnt/koji/deltas...

Not doing this in Koji means not doing this in Koji. See https://fedorahosted.org/koji/ticket/38 for my previous comments on the subject.

All I'm trying to do is come up with something that integrates nicely with the current build-system and is reasonably elegant. I'm afraid I don't see the huge difference between generating rpms and generating deltarpms, so it seemed quite logical to me to store the deltarpms with the rpms. With the "attach rpm signature to deltarpm" code I wrote (see https://fedorahosted.org/koji/ticket/38#comment:3) rpm signatures can be spliced and removed from deltarpms just as easily as they are to rpms.

But I don't want to rock the boat, and I really don't want to offend anyone, so I'm quite happy to store the deltarpms in /mnt/koji/deltas or the like.

comment:18 in reply to: ↑ 15 ; follow-up: ↓ 19 Changed 6 years ago by jdieter

Replying to jdieter:

Just a ping on #13 plus a question about signing. Does bodhi attach the signatures to the rpms that it pushes out? If not, what actually attaches the signatures? And are the signatures passed as a string or a filename? Just trying to implement an xml-rpc call in the prestod daemon to attach signatures.

Another ping on this. I'm mainly trying to work out whether the "canonical" deltarpms should have the signatures attached in-place or whether signatures should be attached on the fly as the deltarpms are needed. Also, knowing where the signing actually comes from would be nice.

comment:19 in reply to: ↑ 18 ; follow-up: ↓ 20 Changed 6 years ago by lmacken

Replying to jdieter:

Replying to jdieter:

Just a ping on #13 plus a question about signing. Does bodhi attach the signatures to the rpms that it pushes out? If not, what actually attaches the signatures? And are the signatures passed as a string or a filename? Just trying to implement an xml-rpc call in the prestod daemon to attach signatures.

Another ping on this. I'm mainly trying to work out whether the "canonical" deltarpms should have the signatures attached in-place or whether signatures should be attached on the fly as the deltarpms are needed. Also, knowing where the signing actually comes from would be nice.

Bodhi does not do anything regarding signatures. Jesse would know more about the current signing process than me, but AFAIK he runs a script that signs the RPMs, imports them into koji, and then tells Koji to write out the given RPMs with a signed header for the specified key. So I believe that Koji does the actualy attaching of the headers. Please correct me if I'm wrong, guys.

comment:20 in reply to: ↑ 19 ; follow-up: ↓ 21 Changed 6 years ago by lmacken

Jesse, any comments on jdieter's questions in #18 ?

comment:21 in reply to: ↑ 20 Changed 6 years ago by jkeating

Replying to lmacken:

Jesse, any comments on jdieter's questions in #18 ?

Your reply was correct. However in the future, we may have bodhi request signed packages on the fly via the signing server. That is yet to be discussed.

comment:22 Changed 6 years ago by lmacken

  • Cc sadmac added

comment:23 Changed 6 years ago by sadmac

I just pushed some changes to Presto to a personal repo here:

git://fedorapeople.org/~sadmac/presto.git

It moves most of the business code of presto-utils into site-packages, thereby offering the following API:

import presto_utils.gendeltarpms as gendelta
gendelta.createPrestoRepo(folder_o_new_rpms, folder_for_resulting_deltas, folder_o_old_rpms)

import presto_utils.genpresto as genpresto
genpresto.writePrestoData(folder_o_deltas, yum_rpminfo_folder)

comment:24 Changed 6 years ago by jdieter

sadmac, I've pushed your changes to the official repo. Let me know if you want me to push it to Rawhide presto-utils.

comment:25 Changed 5 years ago by lmacken

  • Cc skvidal added

comment:26 follow-up: ↓ 27 Changed 5 years ago by skvidal

patches coming later today. Stayed up too late working on making sure they're happy.

comment:27 in reply to: ↑ 26 Changed 5 years ago by jdieter

Replying to skvidal:

patches coming later today. Stayed up too late working on making sure they're happy.

Excellent! Anything you need changed in yum-presto/deltarpm, let me know.

comment:28 Changed 5 years ago by skvidal

jdieter,

if I send you a url to repo with the presto metadata in it - can you check behind me to make sure it is correct?

comment:29 Changed 5 years ago by jdieter

Not a problem at all

comment:30 follow-up: ↓ 31 Changed 5 years ago by skvidal

repo: http://skvidal.fedorapeople.org/misc/delta-test/

Jdieter: I winnowed out a lot of bits from your original code the patch I applied to createrepo HEAD is here:

http://createrepo.baseurl.org/gitweb?p=createrepo.git;a=commitdiff;h=5a5479258a496e6a60cde1bafdd1fa8d3c709a4a

comment:31 in reply to: ↑ 30 ; follow-up: ↓ 32 Changed 5 years ago by jdieter

This looks really good, but there are a couple of corner cases to think about:

  • There really needs to be a way to combine deltarpms. Use case is this:
  • You have A-1.0.rpm and A-2.0.rpm in an updates repository.
  • You generate A-1.0_2.0.drpm, and then remove A-1.0.rpm (because it's an updates repository)
  • You get a new update, A-3.0.rpm
  • You generate A-2.0_3.0.drpm normally, but you have to use combinedeltarpm to generate A-1.0_3.0.drpm as you no longer have the original A-1.0.rpm.

Even if you did have the original A-1.0.rpm, combinedeltarpm will work faster and use less memory because it's not having to run any heuristics.

  • There should be some option that allows you to say "save the first deltarpm". Use case:
  • You have A-1.0.rpm in Fedora repository and A-2.0.rpm in updates repository
  • You're generating deltarpms with the policy that you want to have Fedora->latest-update.drpm and previous-update->latest-update.drpm
  • You generate A-1.0_2.0.drpm
  • You get A-3.0.rpm in updates repository
  • You generate A-2.0_3.0.drpm and A-1.0_3.0.drpm. So far, no problems
  • You get A-4.0.rpm in updates repository
  • You want to generate A-1.0_4.0.drpm and A-3.0_4.0.drpm, but *not* A-2.0_4.0.drpm

Tied into that, I'd probably set the default num_deltas to at least two if oldpackage_paths[] is set. You probably also want to expose num_deltas as a command-line option.

I hope all this makes sense. Thanks again for working on this.

comment:32 in reply to: ↑ 31 ; follow-up: ↓ 33 Changed 5 years ago by skvidal

Replying to jdieter:

This looks really good, but there are a couple of corner cases to think about:

  • There really needs to be a way to combine deltarpms. Use case is this:
  • You have A-1.0.rpm and A-2.0.rpm in an updates repository.
  • You generate A-1.0_2.0.drpm, and then remove A-1.0.rpm (because it's an updates repository)
  • You get a new update, A-3.0.rpm
  • You generate A-2.0_3.0.drpm normally, but you have to use combinedeltarpm to generate A-1.0_3.0.drpm as you no longer have the original A-1.0.rpm.

Even if you did have the original A-1.0.rpm, combinedeltarpm will work faster and use less memory because it's not having to run any heuristics.

Taking it on as an rfe is fine - I'm not yet convinced it is an important or frequently hit case.

  • There should be some option that allows you to say "save the first deltarpm". Use case:
  • You have A-1.0.rpm in Fedora repository and A-2.0.rpm in updates repository
  • You're generating deltarpms with the policy that you want to have Fedora->latest-update.drpm and previous-update->latest-update.drpm
  • You generate A-1.0_2.0.drpm
  • You get A-3.0.rpm in updates repository
  • You generate A-2.0_3.0.drpm and A-1.0_3.0.drpm. So far, no problems
  • You get A-4.0.rpm in updates repository
  • You want to generate A-1.0_4.0.drpm and A-3.0_4.0.drpm, but *not* A-2.0_4.0.drpm

we can 'save the first deltarpm' easily b/c we keep the pristine original release of fedora around in perpetuity. F10 is released and it is static. updates is the only thing that rolls.

Tied into that, I'd probably set the default num_deltas to at least two if oldpackage_paths[] is set. You probably also want to expose num_deltas as a command-line option.

Most of our advanced callers that make repositories use the createrepo api. I'll add it as cli option but anyone wanting to do something involved should consider using the api.

I hope all this makes sense. Thanks again for working on this.

it does make sense. I'm not completely sure how important some of the above cases are for the generic/simple use but we can work on them as we go.

thanks.

comment:33 in reply to: ↑ 32 Changed 5 years ago by jdieter

Replying to skvidal:

Replying to jdieter:

  • There really needs to be a way to combine deltarpms. Use case is this:

Taking it on as an rfe is fine - I'm not yet convinced it is an important or frequently hit case.

Fair enough. Any time someone wants to create deltarpms with a source rpm that no longer exists, we'll hit this, but for the most simple (and common) case, it's not needed.

  • There should be some option that allows you to say "save the first deltarpm". Use case:

we can 'save the first deltarpm' easily b/c we keep the pristine original release of fedora around in perpetuity. F10 is released and it is static. updates is the only thing that rolls.

Yeah, I'm aware of that, and you can easily implement this without implementing my first suggestion. It's just one of those use-cases that's fairly common.

Tied into that, I'd probably set the default num_deltas to at least two if oldpackage_paths[] is set. You probably also want to expose num_deltas as a command-line option.

Most of our advanced callers that make repositories use the createrepo api. I'll add it as cli option but anyone wanting to do something involved should consider using the api.

Yeah, fair enough.

comment:34 Changed 5 years ago by lmacken

  • Resolution set to fixed
  • Status changed from assigned to closed

We are now generating deltarpms during pushes! Closing...

Note: See TracTickets for help on using tickets.