This page is locked for edits: http://fedoraproject.org/wiki/Packaging:SourceURL
I would like to extend it with tips how to work with upstream archives that don't contain name and/or version in the URL (most notably github tag downloads), because I've found it very helpful: https://bugzilla.redhat.com/show_bug.cgi?id=871191#c6
Proposed text: {{{ == URLs missing name or version (e.g. github) == Your upstream tarball might not contain program name or version in the file name. This is the case when you download tagged archives from [https://github.com/ github]. The solution is to append fake #/%{name}-%{version}.tar.gz suffix to the URL, and the [[Rpmdevtools|spectool]] will use that information to name the downloaded archive correctly. Example:
#/%{name}-%{version}.tar.gz
Source0: https://github.com/<username>/%{name}/archive/v%{version}.tar.gz#/%{name}-%{version}.tar.gz
And the "Troublesome URLs" section should be adjusted accordingly (some of the problems mentioned are fixed by the solution above).
Please adjust the page content because I can't.
PS: Also it would be very helpful to add banners at the bottom of all locked wiki pages that say why the page is locked and where is the best place to suggest improvements. The current process puts off potential contributors, it requires you to spend lot of time searching for information.
Is this implemented via tags? Are the tags changable by upstream?
Currently guidelines would say that you need to use a snapshot to do this sort of thing. So this is more than a clarification. However, if there's no way for the URL specified to retrieve something different at a later time it might be okay to allow this.
The "v" seems to be part of the particular upstream's tagging. Here's a slightly clarified form of the draft:
{{{ == URLs missing name or version (e.g. github) ==
Sometimes the upstream tarball might not contain the program name or version in the file name. For instance, this is the case when you download tagged archives from [https://github.com/ github]. If you append a fake #/%{name}-%{version}.tar.gz suffix to the URL the [[Rpmdevtools|spectool]] program will use that information to rename the downloaded archive correctly.
Example:
Source0: https://github.com/USERNAME/%{name}/archive/UPSTREAM_TAG.tar.gz#/%{name}-%{version}.tar.gz
Yes, the "v" was part of a git tag. Example page here:
https://github.com/kparal/sendKindle/tags
(mouse over the links to see .tar.gz downloads).
It is technically possible to change the contents of a tag, you can force git to move tags. But it is similar to upstream changing the contents of a tarball. While it is technically possible, it's widely discouraged. So in this respect I see the github interface equivalent to a directory with tarballs.
Since github is very popular and the number of projects that release tarballs are diminishing (why release tarballs when you have git tags and an easy way to download them), I find the described approach very beneficial and worthy adding to the wiki page.
You new draft looks better, thanks.
Just some drive-by comments:
Bottom line: github might be such an important case that it deserves special treatment, just like sourceforge.
EDIT: Fixing bogus linking, adding last example URL
Replying to [comment:4 leamas]:
There is another scheme: !https://github.com/$user/$project/tarball/$commit/whatever.tar.gz which will download a given commit as a gzipped tarball named whatever.tar.gz.
But this has the disadvantage that the directory inside the tarball has the commit hash in its name, which makes the spec file harder to write.
Still only for github: the #/filename tweak is a good one, but perhaps not the best solution for github which can use something like !https://github.com/kparal/sendKindle/archive/v2/whatever.tar.gz
Nice, that works, unfortunately their UI doesn't allow you to discover that.
I agree. It would be nice to put the #/filename tweak as a generic tip, and then add a specific tip about github.
It does seem like best practice to use something that cannot change and github seems as important as sourceforge to us now.
Looks like there's several permutations of the github urls that can generate tarballs with slight differences.
New proposal:
{{{ === Github ===
Increasingly, we find that some upstream projects are not releasing tarballs. On github, those projects will often tag the source tree with a version and then have github generate tarballs on demand when a user wishes to download the tree for that tag. Unfortunately, tags can be moved so they aren't the best method for retrieving reproducible sources. However, github provides another way to retrieve tarballs based on commit hash. This is our preferred method of getting archives.
git log -n1 TAG
%global commit a247ef03721f9a1c6ec5eacd847630761d75dcc3
in the spec file's Source: line. https://github.com/PROJECT_OWNER/PROJECT_NAME/archive/%{commit}/%{name}-%{version}.tar.gz
PROJECT_OWNER and PROJECT_NAME should be replaced with the specifics of the upstream project. The last part of the URL which has %{name}-%{version}.tar.gz is optional but determines the filename being downloaded. Note that the toplevel directory within the tarball won't correspond to this name. So you'll need to use: %setup -n PROJECT_NAME-%{commit} }}}
%setup -n PROJECT_NAME-%{commit}
And change the troublesome URLs section like so:
{{{ === Troublesome URLs ===
When upstream has URLs for the download that do not end with the tarball name rpm will be unable to parse the tarball out of the source URL. One workaround for many cases is to construct a URL where the tarball is listed in a "URL fragment":
Source0: http://example.com/foo/1.0/download.cgi#/%{name}-%{version}.tar.gz
rpm will then use %{name}-%{version}.tar.gz as the tarball name. If you use spectool -g foo.spec to download the tarball, it will rename the tarball for you.
spectool -g foo.spec
Sometimes this does not work because the upstream cgi tries to parse the fragment or because you need to login or fill in a form to access the tarball. In these cases, you have to put just the tarball's filename into the Source: field. To make clear where you got the tarball, you should leave notes in comments above the Source: line to explain the situation to reviewers and future packagers. For example:
# Mysql has a mirror redirector for its downloads # You can get this tarball by following a link from: # http://dev.mysql.com/downloads/mysql/5.1.html Source0: mysql-5.1.31.tar.gz
}}}
"Troublesome URLs" looks good. I don't really like "Github" section. Yes, theoretically tags can change, but tarballs can change too. In both cases it occurs very scarcely and it's very frowned upon. I would rather list both approaches and let the packager choose based on the project he's packaging and his current needs. Sometimes we want to package specific commits, and then this is useful. But sometimes we want to package just "stable" (tagged) versions and we know the upstream project is reasonable and doesn't change tags (nor tarballs) after release. In the latter case the commit approach is just unnecessary complicated.
Replying to [comment:6 toshio]:
Here is also the issue that the tarballs checked out this way doesn't always have the same modification dates, and thus different checksums even when the same commit (or has it been fixed?). Somewhere in the back of my head I think this might be standard git behaviour, something about cloning a tag vs a commit. Don't find the reference, and might be utterly wrong. Anyway, this needs to be sorted out and mentioned IMHO.
Of course, this is like svn checkouts, also with this issue.
As kparal, I don't see any problems in the "Troublesome URLs" part.
However, the github part still has some issues. One basic problem is the difference with those upstreams which at least tag their releases and those who just present a flow of commits, leaving it to packager to decide which to use. No, we don't like that, but this is sometimes the reality.
That said: - Since the starting point is that there is no version, why encode %{version} in the tarball name? At best this is pointless, perhaps even misleading?! - Using a git commit also means that it will go into the release tag. However, having the full hash in the release tag (and thus in the generated RPMs names) is just to much. For this, the short commit is the natural choice. - A reference to the Versioning guidelines(snapshot, pre-release or post-release) might be useful. - Yes, github resets modification dates to time of tarball generation, checked. Should GL mention that diff -r should be used to verify these tarballs?
Some fragments from a spec where I tried this:
{{{ %global commit d4a1dba64e770a8150777067a791ba8fadc2e668 %global shortcommit %(c=%{commit}; echo ${c:0:7})
Version: 0.0.0 Release: 1.%{shortcommit}%{?dist}
URL: https://github.com/K-S-V/Scripts Source0: %{url}/archive/%{commit}/%{name}-%{shortcommit}.tar.gz
%prep %setup -q -n Scripts-%{commit} }}}
I still think using commits is the way to go, partly because its immutable but also because its always available - not all projects have even tags. And describing both schemas (tags + commits) it's just to much, this to complicated anyway.
New sketch for the github stuff. slask
Troublesome URLs section, as described in comment #6 is approved (+1:5, 0:0, -1:0)
We are going to inform Panu (upstream RPM) that we are adding policy which uses this feature (quirk?) of RPM, and if he is unhappy, we will revisit it.
As for the github specific section, I will write a new draft for consideration and we will revisit that next week.
As many upstreams use github for their source control, it is worth covering how to handle that source in a Fedora Package.
Github provides a mechanism to create tarballs on demand, either from a specific commit revision, or from a specific tag. For a number of reasons (immutability, availability, uniqueness), you must use the full commit revision hash when referring to the sources.
The full 40-character hash can be copied from the github web interface or by cloning the repository and using git rev-parse TAG
git rev-parse TAG
In this example, TAG is the tag for the source revision we are interested in.
Once the commit hash is known, you can define it in your spec file as follows:
%global commit c5a4525bfa3bd9997834d0603c40093e50e3fd19 %global shortcommit %(c=%{commit}; echo ${c:0:7})
For the source tarball, you should use this syntax: Source0: https://github.com/$OWNER/$PROJECT/archive/%{commit}/%{name}-%{version}.tar.gz
Source0: https://github.com/$OWNER/$PROJECT/archive/%{commit}/%{name}-%{version}.tar.gz
...
%prep %setup -qn %{name}-%{commit}
In this syntax, $OWNER must be replaced with the github username for the project's owner, and $PROJECT must be replaced with the github identifier for the project.
If the release corresponds to a github Tag with a sane numeric version, you must use that version to populate the Version field in the spec file. If it does not, look at the source code to see if a version is indicated there, and use that value. If no numeric version is indicated in the code, you may set Version to 0, and treat the package as a "pre-release" package (and make use of the %{shortcommit} macro). See [[Packaging:NamingGuidelines#Pre-Release_packages]] for details.
Alternately, if you are using a specific revision from github that is either a pre-release revision or a post-release revision, you must follow the "snapshot" guidelines. They are documented here: [[Packaging:NamingGuidelines#Snapshot_packages]]. You can substitute %{shortcommit} for %{checkout} in that section.
Keep in mind that github tarballs are generated on-demand, so their modification dates will vary and cause checksum tests to fail. Reviewers will need to use diff -r to verify the tarballs. }}}
Couple of additions. First one makes clear that this is about projects that don't produce archives for releases rather than anything hosted on github. Second just adds some helpful information.
Replying to [comment:11 spot]:
=== Github === As many upstreams use github for their source control, it is worth covering how to handle that source in a Fedora Package. Github provides a mechanism to create tarballs on demand, either from a specific commit revision, or from a specific tag.
=== Github ===
Github provides a mechanism to create tarballs on demand, either from a specific commit revision, or from a specific tag.
[addition]
If the upstream does not create tarballs for releases, you can use this mechanism to produce them.
[end addition]
For a number of reasons (immutability, availability, uniqueness), you must use the full commit revision hash when referring to the sources. The full 40-character hash can be copied from the github web interface
For a number of reasons (immutability, availability, uniqueness), you must use the full commit revision hash when referring to the sources.
The full 40-character hash can be copied from the github web interface
at https://github.com/OWNER/PROJECT/tags
or by cloning the repository and using git rev-parse TAG
Replying to [comment:10 spot]:
Troublesome URLs section, as described in comment #6 is approved (+1:5, 0:0, -1:0) We are going to inform Panu (upstream RPM) that we are adding policy which uses this feature (quirk?) of > RPM, and if he is unhappy, we will revisit it.
We are going to inform Panu (upstream RPM) that we are adding policy which uses this feature (quirk?) of > RPM, and if he is unhappy, we will revisit it.
Doesn't seem like particularly evil to me... all rpmbuild really cares about is the "basename" of the source. If you dont mind knowingly putting garbage URL's into specs, I'm not going to care either :)
[snip]
[snip] If no numeric version is indicated in the code, you may set Version to 0, and treat the package as a "pre-release" package (and make use of the %{shortcommit} macro). Which means a lot of different sources, all named %name-0.tar.gz. Perhaps not a problem, but my gut feeling is bad. Would prefer %{name}-%{shortcommit} at least when there's no usable version EDIT: Removed previous edit, which just was silly "blushes".
If no numeric version is indicated in the code, you may set Version to 0, and treat the package as a "pre-release" package (and make use of the %{shortcommit} macro).
Which means a lot of different sources, all named %name-0.tar.gz. Perhaps not a problem, but my gut feeling is bad. Would prefer %{name}-%{shortcommit} at least when there's no usable version
EDIT: Removed previous edit, which just was silly "blushes".
https://fedoraproject.org/wiki/User:Spot/GitHub_Guidelines approved (+1:5, 0:0, -1:0)
Written up, about to be announced.
So this text has come up twice on devel@ recently, and I've just run up against it again in a package review:
https://bugzilla.redhat.com/show_bug.cgi?id=1047510
I think I'm reading FPC's intent correctly - that the text is intended to essentially state "don't use github's ability to generate tarballs from version tags, even if you want to ship a specific version, generate a tarball from the commit id currently associated with that tag" - but I don't think this is absolutely clear from the text as it stands. Could a very specific statement to this effect be added?
Yes, the problem is that what commit a version points to can change while a commitid can't change. So if you want to download the same tarball you can't use a version.
If you want to change the wording in the documentation, feel free to propose new wording.
Metadata Update from @toshio: - Issue assigned to spot
Login to comment on this ticket.