#233 SourceURL addition: github-like download URLs
Closed: Fixed None Opened 11 years ago by kparal.

This page is locked for edits:
http://fedoraproject.org/wiki/Packaging:SourceURL

I would like to extend it with tips how to work with upstream archives that don't contain name and/or version in the URL (most notably github tag downloads), because I've found it very helpful:
https://bugzilla.redhat.com/show_bug.cgi?id=871191#c6

Proposed text:
{{{
== URLs missing name or version (e.g. github) ==
Your upstream tarball might not contain program name or version in the file name. This is the case when you download tagged archives from [https://github.com/ github].
The solution is to append fake #/%{name}-%{version}.tar.gz suffix to the URL, and the [[Rpmdevtools|spectool]] will use that information to name the downloaded archive correctly.
Example:

Source0: https://github.com/<username>/%{name}/archive/v%{version}.tar.gz#/%{name}-%{version}.tar.gz

}}}

And the "Troublesome URLs" section should be adjusted accordingly (some of the problems mentioned are fixed by the solution above).

Please adjust the page content because I can't.

PS: Also it would be very helpful to add banners at the bottom of all locked wiki pages that say why the page is locked and where is the best place to suggest improvements. The current process puts off potential contributors, it requires you to spend lot of time searching for information.


Is this implemented via tags? Are the tags changable by upstream?

Currently guidelines would say that you need to use a snapshot to do this sort of thing. So this is more than a clarification. However, if there's no way for the URL specified to retrieve something different at a later time it might be okay to allow this.

The "v" seems to be part of the particular upstream's tagging. Here's a slightly clarified form of the draft:

{{{
== URLs missing name or version (e.g. github) ==

Sometimes the upstream tarball might not contain the program name or version
in the file name. For instance, this is the case when you download tagged
archives from [https://github.com/ github]. If you append a fake
#/%{name}-%{version}.tar.gz suffix to the URL the
[[Rpmdevtools|spectool]] program will use that information to rename the
downloaded archive correctly.

Example:

Source0: https://github.com/USERNAME/%{name}/archive/UPSTREAM_TAG.tar.gz#/%{name}-%{version}.tar.gz
  • USERNAME is the github username of upstream.
  • UPSTREAM_TAG is upstream's tag for this release.
    }}}

Yes, the "v" was part of a git tag. Example page here:

https://github.com/kparal/sendKindle/tags

(mouse over the links to see .tar.gz downloads).

It is technically possible to change the contents of a tag, you can force git to move tags. But it is similar to upstream changing the contents of a tarball. While it is technically possible, it's widely discouraged. So in this respect I see the github interface equivalent to a directory with tarballs.

Since github is very popular and the number of projects that release tarballs are diminishing (why release tarballs when you have git tags and an easy way to download them), I find the described approach very beneficial and worthy adding to the wiki page.

You new draft looks better, thanks.

Just some drive-by comments:

Bottom line: github might be such an important case that it deserves special treatment, just like sourceforge.

EDIT: Fixing bogus linking, adding last example URL

Replying to [comment:4 leamas]:

But this has the disadvantage that the directory inside the tarball has the commit hash in its name, which makes the spec file harder to write.

Nice, that works, unfortunately their UI doesn't allow you to discover that.

Bottom line: github might be such an important case that it deserves special treatment, just like sourceforge.

I agree. It would be nice to put the #/filename tweak as a generic tip, and then add a specific tip about github.

It does seem like best practice to use something that cannot change and github seems as important as sourceforge to us now.

Looks like there's several permutations of the github urls that can generate tarballs with slight differences.

New proposal:

{{{
=== Github ===

Increasingly, we find that some upstream projects are not releasing
tarballs. On github, those projects will often tag the source tree
with a version and then have github generate tarballs on demand
when a user wishes to download the tree for that tag. Unfortunately,
tags can be moved so they aren't the best method for retrieving
reproducible sources. However, github provides another way to
retrieve tarballs based on commit hash. This is our preferred method
of getting archives.

Find the commit hash for the version.

  • this can be done by browsing to the tags interface (
    https://github.com/PROJECT_OWNER/PROJECT_NAME/tags ) and
    copying the full commit hash listed there for the desired tag.
  • It can also be found by checking out the upstream repository
    and using git log -n1 TAG where TAG is the name
    of the tag we're interested in.

Put the full git commit hash into a macro, for instance:

%global commit a247ef03721f9a1c6ec5eacd847630761d75dcc3

Use github's URL scheme for downloading via the commit hash

in the spec file's Source: line.
https://github.com/PROJECT_OWNER/PROJECT_NAME/archive/%{commit}/%{name}-%{version}.tar.gz

PROJECT_OWNER and PROJECT_NAME should be replaced with the
specifics of the upstream project. The last part of the URL
which has %{name}-%{version}.tar.gz is optional but determines
the filename being downloaded. Note that the toplevel
directory within the tarball won't correspond to this name.
So you'll need to use:
%setup -n PROJECT_NAME-%{commit}
}}}

And change the troublesome URLs section like so:

{{{
=== Troublesome URLs ===

When upstream has URLs for the download that do not end
with the tarball name rpm will be unable to parse the
tarball out of the source URL. One workaround for many
cases is to construct a URL where the tarball is listed
in a "URL fragment":

Source0: http://example.com/foo/1.0/download.cgi#/%{name}-%{version}.tar.gz

rpm will then use %{name}-%{version}.tar.gz as the tarball
name. If you use spectool -g foo.spec to
download the tarball, it will rename the tarball for you.

Sometimes this does not work because the upstream cgi tries
to parse the fragment or because you need to login or fill
in a form to access the tarball. In these cases, you have
to put just the tarball's filename into the Source: field.
To make clear where you got the tarball, you should leave
notes in comments above the Source: line to explain the
situation to reviewers and future packagers. For example:

 # Mysql has a mirror redirector for its downloads
 # You can get this tarball by following a link from:
 # http://dev.mysql.com/downloads/mysql/5.1.html
 Source0: mysql-5.1.31.tar.gz

}}}

"Troublesome URLs" looks good. I don't really like "Github" section. Yes, theoretically tags can change, but tarballs can change too. In both cases it occurs very scarcely and it's very frowned upon. I would rather list both approaches and let the packager choose based on the project he's packaging and his current needs. Sometimes we want to package specific commits, and then this is useful. But sometimes we want to package just "stable" (tagged) versions and we know the upstream project is reasonable and doesn't change tags (nor tarballs) after release. In the latter case the commit approach is just unnecessary complicated.

Replying to [comment:6 toshio]:

Here is also the issue that the tarballs checked out this way doesn't always have the same modification dates, and thus different checksums even when the same commit (or has it been fixed?). Somewhere in the back of my head I think this might be standard git behaviour, something about cloning a tag vs a commit. Don't find the reference, and might be utterly wrong. Anyway, this needs to be sorted out and mentioned IMHO.

Of course, this is like svn checkouts, also with this issue.

As kparal, I don't see any problems in the "Troublesome URLs" part.

However, the github part still has some issues. One basic problem is the difference with those upstreams which at least tag their releases and those who just present a flow of commits, leaving it to packager to decide which to use. No, we don't like that, but this is sometimes the reality.

That said:
- Since the starting point is that there is no version, why encode %{version} in the tarball name? At best this is pointless, perhaps even misleading?!
- Using a git commit also means that it will go into the release tag. However, having the full hash in the release tag (and thus in the generated RPMs names) is just to much. For this, the short commit is the natural choice.
- A reference to the Versioning guidelines(snapshot, pre-release or post-release) might be useful.
- Yes, github resets modification dates to time of tarball generation, checked. Should GL mention that diff -r should be used to verify these tarballs?

Some fragments from a spec where I tried this:

{{{
%global commit d4a1dba64e770a8150777067a791ba8fadc2e668
%global shortcommit %(c=%{commit}; echo ${c:0:7})

Version: 0.0.0
Release: 1.%{shortcommit}%{?dist}

URL: https://github.com/K-S-V/Scripts
Source0: %{url}/archive/%{commit}/%{name}-%{shortcommit}.tar.gz

%prep
%setup -q -n Scripts-%{commit}
}}}

I still think using commits is the way to go, partly because its immutable but also because its always available - not all projects have even tags. And describing both schemas (tags + commits) it's just to much, this to complicated anyway.

New sketch for the github stuff.
slask

Troublesome URLs section, as described in comment #6 is approved (+1:5, 0:0, -1:0)

We are going to inform Panu (upstream RPM) that we are adding policy which uses this feature (quirk?) of RPM, and if he is unhappy, we will revisit it.

As for the github specific section, I will write a new draft for consideration and we will revisit that next week.

{{{
=== Github ===

As many upstreams use github for their source control, it is worth covering how to
handle that source in a Fedora Package.

Github provides a mechanism to create tarballs on demand, either from a specific commit
revision, or from a specific tag. For a number of reasons (immutability, availability,
uniqueness), you must use the full commit revision hash when referring to the sources.

The full 40-character hash can be copied from the github web interface or by cloning the
repository and using
git rev-parse TAG

In this example, TAG is the tag for the source revision we are interested in.

Once the commit hash is known, you can define it in your spec file as follows:


%global commit c5a4525bfa3bd9997834d0603c40093e50e3fd19
%global shortcommit %(c=%{commit}; echo ${c:0:7})

For the source tarball, you should use this syntax:

Source0: https://github.com/$OWNER/$PROJECT/archive/%{commit}/%{name}-%{version}.tar.gz

...

%prep
%setup -qn %{name}-%{commit}

In this syntax, $OWNER must be replaced with the github username for the project's owner, and
$PROJECT must be replaced with the github identifier for the project.

If the release corresponds to a github Tag with a sane numeric version, you must use that version to
populate the Version field in the spec file. If it does not, look at the source code to see if
a version is indicated there, and use that value. If no numeric version is indicated in the code,
you may set Version to 0, and treat the package as a "pre-release" package (and make use of the
%{shortcommit} macro). See [[Packaging:NamingGuidelines#Pre-Release_packages]] for details.

Alternately, if you are using a specific revision from github that is either a pre-release revision
or a post-release revision, you must follow the "snapshot" guidelines. They are
documented here: [[Packaging:NamingGuidelines#Snapshot_packages]]. You can substitute %{shortcommit}
for %{checkout} in that section.

Keep in mind that github tarballs are generated on-demand, so their modification dates will vary and
cause checksum tests to fail. Reviewers will need to use diff -r to verify the tarballs.
}}}

Couple of additions. First one makes clear that this is about projects that don't produce archives for releases rather than anything hosted on github. Second just adds some helpful information.

Replying to [comment:11 spot]:

=== Github ===

As many upstreams use github for their source control, it is worth covering how to
handle that source in a Fedora Package.

Github provides a mechanism to create tarballs on demand, either from a specific commit
revision, or from a specific tag.

[addition]

If the upstream does not create tarballs for releases, you can use this mechanism to produce them.

[end addition]

For a number of reasons (immutability, availability,
uniqueness), you must use the full commit revision hash when referring to the sources.

The full 40-character hash can be copied from the github web interface

[addition]

at https://github.com/OWNER/PROJECT/tags

[end addition]

or by cloning the
repository and using
git rev-parse TAG

Replying to [comment:10 spot]:

Troublesome URLs section, as described in comment #6 is approved (+1:5, 0:0, -1:0)

We are going to inform Panu (upstream RPM) that we are adding policy which uses this feature (quirk?) of > RPM, and if he is unhappy, we will revisit it.

Doesn't seem like particularly evil to me... all rpmbuild really cares about is the "basename" of the source. If you dont mind knowingly putting garbage URL's into specs, I'm not going to care either :)

Replying to [comment:11 spot]:

[snip]

For the source tarball, you should use this syntax:

Source0: https://github.com/$OWNER/$PROJECT/archive/%{commit}/%{name}-%{version}.tar.gz

[snip]

If no numeric version is indicated in the code,
you may set Version to 0, and treat the package as a "pre-release" package (and make use of the
%{shortcommit} macro).

Which means a lot of different sources, all named %name-0.tar.gz. Perhaps not a problem, but my gut feeling is bad. Would prefer %{name}-%{shortcommit} at least when there's no usable version

EDIT: Removed previous edit, which just was silly "blushes".

Written up, about to be announced.

So this text has come up twice on devel@ recently, and I've just run up against it again in a package review:

https://bugzilla.redhat.com/show_bug.cgi?id=1047510

I think I'm reading FPC's intent correctly - that the text is intended to essentially state "don't use github's ability to generate tarballs from version tags, even if you want to ship a specific version, generate a tarball from the commit id currently associated with that tag" - but I don't think this is absolutely clear from the text as it stands. Could a very specific statement to this effect be added?

Yes, the problem is that what commit a version points to can change while a commitid can't change. So if you want to download the same tarball you can't use a version.

If you want to change the wording in the documentation, feel free to propose new wording.

Metadata Update from @toshio:
- Issue assigned to spot

7 years ago

Login to comment on this ticket.

Metadata