[Buildroot] [PATCH 4/9 v2] WIP: support/download: change format of archives generated from git
Vincent Fazio
vfazio at xes-inc.com
Tue Dec 15 15:55:27 UTC 2020
Yann,
On 12/14/20 11:29 AM, Yann E. MORIN wrote:
> ** WIP: needs an update to all the hashes.
>
> Currently, our git archives are reproducible because we ensure that we
> use one of the few tar versions that generate identical gnu-formatted
> archives. However, than means that any tar version greater or equal
> to 1.30 is not compatible. I.e. we're stuck in the past, forever.
>
> However, thanks to some grunt work by Vincent, we now have a set of
> options that we can pass tar, to generate reproducible archives back
> from tar-1.27 and up through tar-1.32, the latest released version.
>
> However, those archives are not identical to the previous ones generated
> in the (now-broken) gnu format.
>
> To avoid any clashing between old and new archives, and new and old
> Buildroot versions, we need to name the new generated archives
> differently from the existing ones.
>
> So, we bump the git-specific sub-version (not ti be confused with
> subversion!) to _br1, not unlike the Debian packaging versioning.
>
> We could also keep the gzip compression (and the .gz extension), but
> while at it, lets also switch the compression, from the venerable gzip,
> to the not-so-new-nowadays xz. But since xz is quite slower than xz, we
> add traces that something is going on, so users do not wonder why there
> does not seem to be any progress.
>
> The --pax-option, to set specific PAX headers, does not accept RFC2822
> timestamps which value are too away from some fixed point (set at
> compile-time?):
> tar: Time stamp is out of allowed range
>
> However, the same timestamps passed as strict compliant ISO 8601 is
> accepted, so that's what we switch to as the date representation (%ci
> has been supported by git back to 1.6.0, released August 2008).
>
> Signed-off-by: Yann E. MORIN <yann.morin.1998 at free.fr>
> Cc: Vincent Fazio <vfazio at xes-inc.com>
> Cc: Thomas Petazzoni <thomas.petazzoni at bootlin.com>
>
> --------
> PS. Here is a Makefile used to test all the versions of tar, along with
> a set of options:
>
> # Versions prior to 1.27 do not build on recent machines, because 'gets'
> # got removed (rightfully so), so don't count them as candidates.
> VERSIONS = 1.27 1.27.1 1.28 1.29 1.30 1.31 1.32
> DATE = Thu 21 May 2020 06:44:11 PM CEST
>
> TARS = \
> $(patsubst %,test_gnu_%.tar,$(VERSIONS)) \
> $(patsubst %,test_posix_%.tar,$(VERSIONS)) \
> $(patsubst %,test_posix_paxoption_%.tar,$(VERSIONS))
>
> all: $(TARS)
> sha1sum $(^)
>
> .INTERMEDIATE: test_%.tar
> test_gnu_%.tar: tar.% list
> ./$(<) cf - -C test \
> --transform="s#^\./#test-version/#" \
> --numeric-owner --owner=0 --group=0 \
> --mtime="$(DATE)" \
> --format=gnu \
> -T list \
> >$(@)
> test_posix_%.tar: tar.% list
> ./$(<) cf - -C test \
> --transform="s#^\./#test-version/#" \
> --numeric-owner --owner=0 --group=0 \
> --mtime="$(DATE)" \
> --format=posix \
> -T list \
> >$(@)
> test_posix_paxoption_%.tar: tar.% list
> ./$(<) cf - -C test \
> --transform="s#^\./#test-version/#" \
> --numeric-owner --owner=0 --group=0 \
> --mtime="$(DATE)" \
> --format=posix \
> --pax-option='delete=atime,delete=ctime,delete=mtime' \
> --pax-option='exthdr.name=%d/PaxHeaders/%f,exthdr.mtime={$(DATE)}' \
> -T list \
> >$(@)
>
> list: .FORCE
> list: test
> (cd test && find . -not -type d ) |LC_ALL=C sort >$(@)
>
> LONG = L$$(for i in $$(seq 1 200); do printf 'o'; done)ng
> test: .FORCE
> test:
> rm -rf test
> mkdir -p test/bar
> echo foo >test/Foo
> echo bar >test/bar/Bar
> ln -s bar/Bar test/buz
> echo long >test/Very-$(LONG)-filename
> ln test/Very-$(LONG)-filename \
> test/short
>
> .PRECIOUS: tar.%
> tar.%: tar-%
> cd $(<) && ./configure
> $(MAKE) -C $(<)
> install -m 0755 $(<)/src/tar $(@)
>
> .PRECIOUS: tar-%
> tar-%: tar-%.tar.gz
> tar xzf $(<)
>
> .PRECIOUS: tar-%.tar.gz
> tar-%.tar.gz:
> wget "https://ftp.gnu.org/gnu/tar/$(@)"
>
> .FORCE:
>
> clean:
> rm -rf tar-* tar.* test_* test list
> ---
> package/pkg-download.mk | 3 +++
> support/download/git | 19 +++++++++++++------
> 2 files changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/package/pkg-download.mk b/package/pkg-download.mk
> index 951d2fb554..e85f844b45 100644
> --- a/package/pkg-download.mk
> +++ b/package/pkg-download.mk
> @@ -17,6 +17,9 @@ export HG := $(call qstrip,$(BR2_HG))
> export SCP := $(call qstrip,$(BR2_SCP))
> export LOCALFILES := $(call qstrip,$(BR2_LOCALFILES))
>
Should we add documentation/comments here that explain what circumstances would constitute a bump in the BR_SUB_VERSION
value? Like any change in the tar command or compressor in support/download/<backend> that changes the hash for a
package? If we're only bumping this on changes to the tarball hash, a more descriptive name (BR_TAR_FORMAT_REV_<method>)
may self document?
> +BR_SUB_VERSION_git = _br1
> +BR_SITE_METHOD_Z_git = .xz
Is there a more descriptive name we can use to denote what it's being used for? "Z" doesn't tell me much. TAR_EXT or
some such is a bit more self-documenting.
I know adding more parameters to dl-wrapper to pass to the backends is always met with some resistance, so I won't go so
far as to say we should pass the compressor (`xz -9`, `gzip -6`, `cat`) from pkg-download.mk to dl-wrapper, but some
in-file documentation that mentions the linkage between this value and the compressor in use in support/download/git
should probably be sufficient since this extension also informs what we use to decompress the tarball with (from
pkg-utils.mk/pkg-generic.mk). I mean, I'm sure we won't forget, but a few extra comments mentioning the need to keep
these in sync wont hurt.
> +
> DL_WRAPPER = support/download/dl-wrapper
>
> # DL_DIR may have been set already from the environment
> diff --git a/support/download/git b/support/download/git
> index 15d8c66e05..b670b23a67 100755
> --- a/support/download/git
> +++ b/support/download/git
> @@ -170,8 +170,8 @@ _git checkout -f -q "'${cset}'"
> _git clean -ffdx
>
> # Get date of commit to generate a reproducible archive.
> -# %cD is RFC2822, so it's fully qualified, with TZ and all.
> -date="$( _git log -1 --pretty=format:%cD )"
> +# %ci is ISO 8601, so it's fully qualified, with TZ and all.
> +date="$( _git log -1 --pretty=format:%ci )"
>
> # There might be submodules, so fetch them.
> if [ ${recurse} -eq 1 ]; then
> @@ -201,12 +201,19 @@ find . -not -type d \
> -and -not -path "./.git/*" >"${output}.list"
> LC_ALL=C sort <"${output}.list" >"${output}.list.sorted"
>
> -# Create GNU-format tarballs, since that's the format of the tarballs on
> -# sources.buildroot.org and used in the *.hash files
> +# Explicit options to ensure reproducibility of the archive
> +pax_options="delete=atime,delete=ctime,delete=mtime"
> +pax_options+=",exthdr.name=%d/PaxHeaders/%f,exthdr.mtime={${date}}"
> +
> +# Create tarballs in the posix format, since that's the most
> +# reproducible format
> +printf 'Creating tarball (%d files)...\n' "$( cat "${output}.list.sorted" |wc -l )"
> tar cf - --transform="s#^\./#${basename}/#" \
> - --numeric-owner --owner=0 --group=0 --mtime="${date}" --format=gnu \
> + --numeric-owner --owner=0 --group=0 --mtime="${date}" \
> + --format=posix --pax-option="${pax_options}" \
> -T "${output}.list.sorted" >"${output}.tar"
> -gzip -6 -n <"${output}.tar" >"${output}"
> +printf 'Compressing tarball (%d bytes)...\n' "$( stat -c %s "${output}.tar" )"
> +xz -9 <"${output}.tar" >"${output}"
>
> rm -f "${output}.list"
> rm -f "${output}.list.sorted"
>
--
Vincent Fazio
Embedded Software Engineer - Linux
Extreme Engineering Solutions, Inc
http://www.xes-inc.com
More information about the buildroot
mailing list