The SHA1 replacement plan

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

The SHA1 replacement plan

Joerg Sonnenberger
Hello all,
as you might have seen by various changesets in the recent past, I'm
looking actively into the steps required to retire SHA1. The current
goal is to focus on new repositories and/or a one-time conversion with
the option of partial compatibility with old changeset ids as follow-up.

At this point there are a number of bigger questions for integration
that require some decision making.

(1) Which hash function to pick.

The most obvious candidate is SHA2-256. The advantage is that it is widely
supported and some CPUs have hardware acceleration for that. The downside
is that SHA2 has a similar structure to SHA1 and many of crypto analysis
of SHA1 and the resulting attacks apply to SHA2 as well. At this point,
the primary protection of SHA2 is the larger hash size only and it is not
clear what the security marging actually is. There are some structural
weaknesses that can apply to our specific use too in the case of the size
extension attacks. This is the choice picked by Git BTW, but personally
I'm not really comfortable with it.

The second most widely supported hash function would be BLAKE2s. It is
supported by hashlib out of the box and typically faster than software
implementations of SHA2-256. It has its own RFC (RFC 7693). Its core is
based on one of the SHA3 finalists with a reduced number of rounds and
different block combination scheme.

The third category would be picking one of the more modern speed optimised
variants of the SHA3 finalists. K12 and BLAKE3 both use the sponge
function of one of the SHA3 finalists, play with different rounds and
use a different block combination scheme. The BLAKE3 scheme parallizes
better which is a great advantage for larger files and SIMD support,
both AVX2 and AVX512 for example profit for medium to large files (>
4KB). There are concerns about the security margin here given the
massively reduced number of rounds (even compared to BLAKE2). The *
variant would be using the same number of rounds as BLAKE2. K12 is a
similar design based on the SHA3 sponge function. It doesn't scale as
well for medium file sizes, e.g. in the 1-8 KB range. It is more sound
in its choices otherwise. Neither algorithm is currently supported out
of the box by Python or OpenSSL, so a pure Python version would be slow.

I've attached basic benchmark numbers below. The asm variant is using
whatever my Threadripper supports in terms of low-level primitives, e.g.
AVX2 and the SHA extension, either from OpenSSL (BLAKE2, SHA2, SHA3) or
the reference implementations (K12, BLAKE3, BLAKE3*). Test case was
hashing a large file (~7GB).

BLAKE2s256 (asm)  13.8s
SHA2-256   (asm)   4.5s
SHA2-256   (C)    28.0s
SHA3-256   (asm)  16.7s
SHA3-256   (C)    19.8s
K12        (asm)   5.9s
K12        (C)     9.2s
BLAKE3     (asm)   4.1s
BLAKE3     (C)    10.1s
BLAKE3*    (asm)   5.5s
BLAKE3*    (C)    13.8s

We try to avoid hashing, but it is still somewhat in the fast path
especially for finding out whether a file really was changed or during
commits, so it is something that should be carefully picked.

(2) How to deal with tests

Pretty much every single test in the tree has to be adjusted. Some for
the new repo flag, but almost all of them for the changed changeset ids
in one form or another. I was looking at providing a global replacement
but this is both tricky and doesn't really help with the various tests
that depend on output sizes. At the moment I'm inclined to copy the
tests piecewise and update them to match the new hashes, but otherwise
keep them as duplicated.

(3) mercurial.nodes

This is the biggest point of pain after the test cases. We are using
nullid and friends all over the place and the new constant depends on
the hash function size. Some places would be better served by having a
predicate (isnullid) that could match against both values. Many other
places would need to obtain the correct id from the repository and that
can result in necessary API changes to push repo down etc. Not sure
about the big picture yet.

(4) Backward compatibility

The revlog design at the moment doesn't allow multiple hashes in
parallel and the manifest design makes computing them in parallel
difficult as well. The plan for upgrading repositories therefore would
be to build a static New Hash -> legacy SHA1 map that can be distributed
to clients, so that look-ups by old changeset ID can use the side table
as fallback. This would only support old changesets, new changesets are
New Hash-only.

Joerg
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: The SHA1 replacement plan

Augie Fackler-2
On Sun, Jul 26, 2020 at 06:26:51PM +0200, Joerg Sonnenberger wrote:
> Hello all,
> as you might have seen by various changesets in the recent past, I'm
> looking actively into the steps required to retire SHA1. The current
> goal is to focus on new repositories and/or a one-time conversion with
> the option of partial compatibility with old changeset ids as follow-up.

Yay! I've given this some thought, but haven't been able to get time
to work on this.

>
> At this point there are a number of bigger questions for integration
> that require some decision making.
>
> (1) Which hash function to pick.
>
> The most obvious candidate is SHA2-256. The advantage is that it is widely
> supported and some CPUs have hardware acceleration for that. The downside
> is that SHA2 has a similar structure to SHA1 and many of crypto analysis
> of SHA1 and the resulting attacks apply to SHA2 as well. At this point,
> the primary protection of SHA2 is the larger hash size only and it is not
> clear what the security marging actually is. There are some structural
> weaknesses that can apply to our specific use too in the case of the size
> extension attacks. This is the choice picked by Git BTW, but personally
> I'm not really comfortable with it.

Agreed. That said, in general we should be reserving bits in the
format to allow the _next_ hash function change to be easy: rather
than picking one today, we should lay the groundwork for moving to
SHA4 (or whatever) in 20x6 as well as moving to $SOMETHING today.

My strategy for this was to use blake2b, but configured to a 31-byte
hash, so we can reserve one byte for signalling what hash is in
use. 0x00 would mean sha1 by default, 0x01 would be blake2b, and 0xfe
would be reserved for "proprietary use" - that could either be in
tests (I'd actually like a way to configure a wicked-fast "hash"
function for tests, and this would give us that opening) or for things
like Google's fake hashes.

>
> The second most widely supported hash function would be BLAKE2s.

I've been strongly favoring blake2b for years now. Why prefer s over b?

(I'll poke some crypto nerds about their informed opinion.)

> It is
> supported by hashlib out of the box and typically faster than software
> implementations of SHA2-256. It has its own RFC (RFC 7693). Its core is
> based on one of the SHA3 finalists with a reduced number of rounds and
> different block combination scheme.
>
> The third category would be picking one of the more modern speed optimised
> variants of the SHA3 finalists. K12 and BLAKE3 both use the sponge
> function of one of the SHA3 finalists, play with different rounds and
> use a different block combination scheme. The BLAKE3 scheme parallizes
> better which is a great advantage for larger files and SIMD support,
> both AVX2 and AVX512 for example profit for medium to large files (>
> 4KB). There are concerns about the security margin here given the
> massively reduced number of rounds (even compared to BLAKE2). The *
> variant would be using the same number of rounds as BLAKE2. K12 is a
> similar design based on the SHA3 sponge function. It doesn't scale as
> well for medium file sizes, e.g. in the 1-8 KB range. It is more sound
> in its choices otherwise. Neither algorithm is currently supported out
> of the box by Python or OpenSSL, so a pure Python version would be slow.

Last I knew there was only a Rust impl of blake3, and it hasn't gotten
as much crypto scrutiny. A crypto expert at Google was who pushed me
to blake2b, so if we go a non-blake2 direction I'd want to run it by them.

>
> I've attached basic benchmark numbers below. The asm variant is using
> whatever my Threadripper supports in terms of low-level primitives, e.g.
> AVX2 and the SHA extension, either from OpenSSL (BLAKE2, SHA2, SHA3) or
> the reference implementations (K12, BLAKE3, BLAKE3*). Test case was
> hashing a large file (~7GB).
>
> BLAKE2s256 (asm)  13.8s
> SHA2-256   (asm)   4.5s
> SHA2-256   (C)    28.0s
> SHA3-256   (asm)  16.7s
> SHA3-256   (C)    19.8s
> K12        (asm)   5.9s
> K12        (C)     9.2s
> BLAKE3     (asm)   4.1s
> BLAKE3     (C)    10.1s
> BLAKE3*    (asm)   5.5s
> BLAKE3*    (C)    13.8s
>
> We try to avoid hashing, but it is still somewhat in the fast path
> especially for finding out whether a file really was changed or during
> commits, so it is something that should be carefully picked.
>
> (2) How to deal with tests
>
> Pretty much every single test in the tree has to be adjusted. Some for
> the new repo flag, but almost all of them for the changed changeset ids
> in one form or another. I was looking at providing a global replacement
> but this is both tricky and doesn't really help with the various tests
> that depend on output sizes. At the moment I'm inclined to copy the
> tests piecewise and update them to match the new hashes, but otherwise
> keep them as duplicated.

My inclination was to try and do some "test golf" and get hashes out
of most tests. I think we should have a handful of tests that care
about hashes, and most shouldn't. For the initial porting effort, a
config knob, off-by-default to enable blake2b, then write a single
test that exercises those codepaths.

>
> (3) mercurial.nodes
>
> This is the biggest point of pain after the test cases. We are using
> nullid and friends all over the place and the new constant depends on
> the hash function size. Some places would be better served by having a
> predicate (isnullid) that could match against both values. Many other
> places would need to obtain the correct id from the repository and that
> can result in necessary API changes to push repo down etc. Not sure
> about the big picture yet.

If (big if) we can move the whole codebase to 32-byte hashes, then we
can just carry 32-byte nullid et al. If that won't work, probably we
write sad helper functions to do things like isnullid()...

>
> (4) Backward compatibility
>
> The revlog design at the moment doesn't allow multiple hashes in
> parallel and the manifest design makes computing them in parallel
> difficult as well. The plan for upgrading repositories therefore would
> be to build a static New Hash -> legacy SHA1 map that can be distributed
> to clients, so that look-ups by old changeset ID can use the side table
> as fallback. This would only support old changesets, new changesets are
> New Hash-only.

This sucks, because it means migration implies re-cloning rather than
just an incremental pull. But if you're going to do the work, it's
probably good enough.

>
> Joerg
> _______________________________________________
> Mercurial-devel mailing list
> [hidden email]
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: The SHA1 replacement plan

Josef 'Jeff' Sipek
In reply to this post by Joerg Sonnenberger
On Sun, Jul 26, 2020 at 18:26:51 +0200, Joerg Sonnenberger wrote:
...
> I've attached basic benchmark numbers below. The asm variant is using
> whatever my Threadripper supports in terms of low-level primitives, e.g.
> AVX2 and the SHA extension, either from OpenSSL (BLAKE2, SHA2, SHA3) or
> the reference implementations (K12, BLAKE3, BLAKE3*). Test case was
> hashing a large file (~7GB).

While these performance measurements are important, it is also important to
make sure that older (or less "top of the line") hardware isn't completely
terrible.  For example, it is completely reasonable (at least IMO) to still
use a Sandy Bridge-era CPUs.  Likewise, it is reasonable to run hg on a
embedded system (although those tend to have wimpy I/O as well).

Anyway, thanks for spending time on this!

Jeff.

--
All parts should go together without forcing.  You must remember that the
parts you are reassembling were disassembled by you.  Therefore, if you
can’t get them together again, there must be a reason.  By all means, do not
use a hammer.
                — IBM Manual, 1925
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: The SHA1 replacement plan [Hash function part]

Joerg Sonnenberger
In reply to this post by Augie Fackler-2
On Tue, Jul 28, 2020 at 02:31:09PM -0400, Augie Fackler wrote:
> > The second most widely supported hash function would be BLAKE2s.
>
> I've been strongly favoring blake2b for years now. Why prefer s over b?

Performance on 32bit platforms of blake2b is ...bad. blake2s works
reasonable well on both. It's also one of the reasons to prefer the
successors as they don't have the same problems.

> > It is
> > supported by hashlib out of the box and typically faster than software
> > implementations of SHA2-256. It has its own RFC (RFC 7693). Its core is
> > based on one of the SHA3 finalists with a reduced number of rounds and
> > different block combination scheme.
> >
> > The third category would be picking one of the more modern speed optimised
> > variants of the SHA3 finalists. K12 and BLAKE3 both use the sponge
> > function of one of the SHA3 finalists, play with different rounds and
> > use a different block combination scheme. The BLAKE3 scheme parallizes
> > better which is a great advantage for larger files and SIMD support,
> > both AVX2 and AVX512 for example profit for medium to large files (>
> > 4KB). There are concerns about the security margin here given the
> > massively reduced number of rounds (even compared to BLAKE2). The *
> > variant would be using the same number of rounds as BLAKE2. K12 is a
> > similar design based on the SHA3 sponge function. It doesn't scale as
> > well for medium file sizes, e.g. in the 1-8 KB range. It is more sound
> > in its choices otherwise. Neither algorithm is currently supported out
> > of the box by Python or OpenSSL, so a pure Python version would be slow.
>
> Last I knew there was only a Rust impl of blake3, and it hasn't gotten
> as much crypto scrutiny. A crypto expert at Google was who pushed me
> to blake2b, so if we go a non-blake2 direction I'd want to run it by them.

There is a C implementation of BLAKE3 as well. There are valid concerns
about the round reduction and I'm not sure if we want to go with a
custom variant. That speaks more in favor of K12. K12 can use the ARM
SHA3 acceleration features.

Joerg
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: The SHA1 replacement plan

Joerg Sonnenberger
In reply to this post by Josef 'Jeff' Sipek
On Tue, Jul 28, 2020 at 03:02:46PM -0400, Josef 'Jeff' Sipek wrote:

> On Sun, Jul 26, 2020 at 18:26:51 +0200, Joerg Sonnenberger wrote:
> ...
> > I've attached basic benchmark numbers below. The asm variant is using
> > whatever my Threadripper supports in terms of low-level primitives, e.g.
> > AVX2 and the SHA extension, either from OpenSSL (BLAKE2, SHA2, SHA3) or
> > the reference implementations (K12, BLAKE3, BLAKE3*). Test case was
> > hashing a large file (~7GB).
>
> While these performance measurements are important, it is also important to
> make sure that older (or less "top of the line") hardware isn't completely
> terrible.  For example, it is completely reasonable (at least IMO) to still
> use a Sandy Bridge-era CPUs.  Likewise, it is reasonable to run hg on a
> embedded system (although those tend to have wimpy I/O as well).

Yeah, that's why I included the C variant. Those are essentially
straight forward implementations used in pkgsrc's digest tool. Ideally,
we can pick a function that not too much worse than SHA1. Just to
establish that baseline:

SHA1 (asm) 4.8s
SHA1 (C)   10.7s

So K12 is somewhat slower on a Threadripper, but should be somewhat
faster than hardware without specific acceleration. SHA support on the
Zen1 Threadripper is quite fast.

Joerg
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: The SHA1 replacement plan [Tests, mercurial.nodes handling, compat]

Joerg Sonnenberger
In reply to this post by Augie Fackler-2
On Tue, Jul 28, 2020 at 02:31:09PM -0400, Augie Fackler wrote:

> > (2) How to deal with tests
> >
> > Pretty much every single test in the tree has to be adjusted. Some for
> > the new repo flag, but almost all of them for the changed changeset ids
> > in one form or another. I was looking at providing a global replacement
> > but this is both tricky and doesn't really help with the various tests
> > that depend on output sizes. At the moment I'm inclined to copy the
> > tests piecewise and update them to match the new hashes, but otherwise
> > keep them as duplicated.
>
> My inclination was to try and do some "test golf" and get hashes out
> of most tests. I think we should have a handful of tests that care
> about hashes, and most shouldn't. For the initial porting effort, a
> config knob, off-by-default to enable blake2b, then write a single
> test that exercises those codepaths.

Yeah, I was looking into adjusting test cases semi-automatically by
having a mapping between old-style and new-style changeset ids. It's
annoying to compute due to the whole manifest link thing. It would also
need to be turned back at the end, because we always diff against the
base version of the test case. That was the part where I essentially
gave up. Especially as it doesn't even help with all the test cases that
cut/change output (*cough* test-http-bad-server.t, I'm looking at you).

> >
> > (3) mercurial.nodes
> >
> > This is the biggest point of pain after the test cases. We are using
> > nullid and friends all over the place and the new constant depends on
> > the hash function size. Some places would be better served by having a
> > predicate (isnullid) that could match against both values. Many other
> > places would need to obtain the correct id from the repository and that
> > can result in necessary API changes to push repo down etc. Not sure
> > about the big picture yet.
>
> If (big if) we can move the whole codebase to 32-byte hashes, then we
> can just carry 32-byte nullid et al. If that won't work, probably we
> write sad helper functions to do things like isnullid()...

The problem with this is that we would still need to distinguish between
them in various places to truncate at appropiately. Not sure that is
easier...

> > (4) Backward compatibility
> >
> > The revlog design at the moment doesn't allow multiple hashes in
> > parallel and the manifest design makes computing them in parallel
> > difficult as well. The plan for upgrading repositories therefore would
> > be to build a static New Hash -> legacy SHA1 map that can be distributed
> > to clients, so that look-ups by old changeset ID can use the side table
> > as fallback. This would only support old changesets, new changesets are
> > New Hash-only.
>
> This sucks, because it means migration implies re-cloning rather than
> just an incremental pull. But if you're going to do the work, it's
> probably good enough.

Now if only we had a good protocol for exchanging unversioned additional
data :) More seriously, I think it is reasonable to get the case of new
non-SHA1 repositories working first and then look into how to make an
incremental update path possible. The infrastructure for that has wider
applications too.

Joerg
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: The SHA1 replacement plan [Hash function part]

Augie Fackler-2
In reply to this post by Joerg Sonnenberger


> On Jul 28, 2020, at 15:29, Joerg Sonnenberger <[hidden email]> wrote:
>
> On Tue, Jul 28, 2020 at 02:31:09PM -0400, Augie Fackler wrote:
>>> The second most widely supported hash function would be BLAKE2s.
>>
>> I've been strongly favoring blake2b for years now. Why prefer s over b?
>
> Performance on 32bit platforms of blake2b is ...bad. blake2s works
> reasonable well on both. It's also one of the reasons to prefer the
> successors as they don't have the same problems.

Why do we care one iota about 32 bit CPUs? Not trolling, sincere: even Raspberry Pi machines are now 64-bit, so I'm pretty unsympathetic to using a much-less-vetted hash function to help out obsolete hardware.

>
>>> It is
>>> supported by hashlib out of the box and typically faster than software
>>> implementations of SHA2-256. It has its own RFC (RFC 7693). Its core is
>>> based on one of the SHA3 finalists with a reduced number of rounds and
>>> different block combination scheme.
>>>
>>> The third category would be picking one of the more modern speed optimised
>>> variants of the SHA3 finalists. K12 and BLAKE3 both use the sponge
>>> function of one of the SHA3 finalists, play with different rounds and
>>> use a different block combination scheme. The BLAKE3 scheme parallizes
>>> better which is a great advantage for larger files and SIMD support,
>>> both AVX2 and AVX512 for example profit for medium to large files (>
>>> 4KB). There are concerns about the security margin here given the
>>> massively reduced number of rounds (even compared to BLAKE2). The *
>>> variant would be using the same number of rounds as BLAKE2. K12 is a
>>> similar design based on the SHA3 sponge function. It doesn't scale as
>>> well for medium file sizes, e.g. in the 1-8 KB range. It is more sound
>>> in its choices otherwise. Neither algorithm is currently supported out
>>> of the box by Python or OpenSSL, so a pure Python version would be slow.
>>
>> Last I knew there was only a Rust impl of blake3, and it hasn't gotten
>> as much crypto scrutiny. A crypto expert at Google was who pushed me
>> to blake2b, so if we go a non-blake2 direction I'd want to run it by them.
>
> There is a C implementation of BLAKE3 as well. There are valid concerns
> about the round reduction and I'm not sure if we want to go with a
> custom variant. That speaks more in favor of K12. K12 can use the ARM
> SHA3 acceleration features.

I very much want to stay on the "beaten path" of hash functions and as a hash function enthusiast you've introduced me to K12, which is a strike against it IMO.

Are you aware of any big players caring about/examining blake2s? If not, I'm still much much more comfortable with blake2b as the "right" tradeoff...

>
> Joerg

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: The SHA1 replacement plan

Josef 'Jeff' Sipek
In reply to this post by Joerg Sonnenberger
On Tue, Jul 28, 2020 at 21:35:51 +0200, Joerg Sonnenberger wrote:

> On Tue, Jul 28, 2020 at 03:02:46PM -0400, Josef 'Jeff' Sipek wrote:
> > On Sun, Jul 26, 2020 at 18:26:51 +0200, Joerg Sonnenberger wrote:
> > ...
> > > I've attached basic benchmark numbers below. The asm variant is using
> > > whatever my Threadripper supports in terms of low-level primitives, e.g.
> > > AVX2 and the SHA extension, either from OpenSSL (BLAKE2, SHA2, SHA3) or
> > > the reference implementations (K12, BLAKE3, BLAKE3*). Test case was
> > > hashing a large file (~7GB).
> >
> > While these performance measurements are important, it is also important to
> > make sure that older (or less "top of the line") hardware isn't completely
> > terrible.  For example, it is completely reasonable (at least IMO) to still
> > use a Sandy Bridge-era CPUs.  Likewise, it is reasonable to run hg on a
> > embedded system (although those tend to have wimpy I/O as well).
>
> Yeah, that's why I included the C variant.

I don't trust compilers *not* to do some massive amount of optimization
unless they are told to target an older CPU.  Also, newer CPUs like to do a
lot of "magic" to speed things up and keep security researchers employed ;)

> Those are essentially
> straight forward implementations used in pkgsrc's digest tool. Ideally,
> we can pick a function that not too much worse than SHA1.

Agreed.

> Just to establish that baseline:
>
> SHA1 (asm) 4.8s
> SHA1 (C)   10.7s
>
> So K12 is somewhat slower on a Threadripper, but should be somewhat
> faster than hardware without specific acceleration. SHA support on the
> Zen1 Threadripper is quite fast.

I think we're in agreement.  The new algo shouldn't be much worse than the
existing SHA1.

For the record: when the time comes, I'm willing to collect some hash perf
data on slightly older/weaker hw as a sanity check.

Jeff.

--
A CRAY is the only computer that runs an endless loop in just 4 hours...
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: The SHA1 replacement plan [Hash function part]

Joerg Sonnenberger
In reply to this post by Augie Fackler-2
On Tue, Jul 28, 2020 at 03:49:58PM -0400, Augie Fackler wrote:

>
>
> > On Jul 28, 2020, at 15:29, Joerg Sonnenberger <[hidden email]> wrote:
> >
> > On Tue, Jul 28, 2020 at 02:31:09PM -0400, Augie Fackler wrote:
> >>> The second most widely supported hash function would be BLAKE2s.
> >>
> >> I've been strongly favoring blake2b for years now. Why prefer s over b?
> >
> > Performance on 32bit platforms of blake2b is ...bad. blake2s works
> > reasonable well on both. It's also one of the reasons to prefer the
> > successors as they don't have the same problems.
>
> Why do we care one iota about 32 bit CPUs? Not trolling, sincere: even
> Raspberry Pi machines are now 64-bit, so I'm pretty unsympathetic to
> using a much-less-vetted hash function to help out obsolete hardware.

They use pretty much the same core design, just different "tuning"
choices. There is little evidence that one is more secure than the
other. 32bit CPUs are still quite popular, but that's a separate
discussion.

> > There is a C implementation of BLAKE3 as well. There are valid concerns
> > about the round reduction and I'm not sure if we want to go with a
> > custom variant. That speaks more in favor of K12. K12 can use the ARM
> > SHA3 acceleration features.
>
> I very much want to stay on the "beaten path" of hash functions and as
> a hash function enthusiast you've introduced me to K12, which is a
> strike against it IMO.

K12 is pretty much on the beaten path. It is essentiall for the SHA3
what BLAKE2 is for BLAKE, a reduction of the number of rounds for the
hash function and a tree hash mode to allow sponging blocks in parallel.

Joerg
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: The SHA1 replacement plan

Joerg Sonnenberger
In reply to this post by Josef 'Jeff' Sipek
On Tue, Jul 28, 2020 at 03:55:55PM -0400, Josef 'Jeff' Sipek wrote:

> On Tue, Jul 28, 2020 at 21:35:51 +0200, Joerg Sonnenberger wrote:
> > On Tue, Jul 28, 2020 at 03:02:46PM -0400, Josef 'Jeff' Sipek wrote:
> > > On Sun, Jul 26, 2020 at 18:26:51 +0200, Joerg Sonnenberger wrote:
> > > ...
> > > > I've attached basic benchmark numbers below. The asm variant is using
> > > > whatever my Threadripper supports in terms of low-level primitives, e.g.
> > > > AVX2 and the SHA extension, either from OpenSSL (BLAKE2, SHA2, SHA3) or
> > > > the reference implementations (K12, BLAKE3, BLAKE3*). Test case was
> > > > hashing a large file (~7GB).
> > >
> > > While these performance measurements are important, it is also important to
> > > make sure that older (or less "top of the line") hardware isn't completely
> > > terrible.  For example, it is completely reasonable (at least IMO) to still
> > > use a Sandy Bridge-era CPUs.  Likewise, it is reasonable to run hg on a
> > > embedded system (although those tend to have wimpy I/O as well).
> >
> > Yeah, that's why I included the C variant.
>
> I don't trust compilers *not* to do some massive amount of optimization
> unless they are told to target an older CPU.  Also, newer CPUs like to do a
> lot of "magic" to speed things up and keep security researchers employed ;)

The core of the hash functions isn't by itself very friendly to compiler
optimisations. It's more a case of how bad the automatic code generation
will be.

> > Just to establish that baseline:
> >
> > SHA1 (asm) 4.8s
> > SHA1 (C)   10.7s
> >
> > So K12 is somewhat slower on a Threadripper, but should be somewhat
> > faster than hardware without specific acceleration. SHA support on the
> > Zen1 Threadripper is quite fast.
>
> I think we're in agreement.  The new algo shouldn't be much worse than the
> existing SHA1.
>
> For the record: when the time comes, I'm willing to collect some hash perf
> data on slightly older/weaker hw as a sanity check.

If you have a modern OpenSSL version, you can get the numbers for
sha256, sha3-256, blake2b512 and blake2s256 easily. K12 for non-vector
CPUs or short messages can be reasonable approximated as "half of
sha3-256 time" as that's the primary difference. BLAKE3 is the only one
that would be more tricky.

Joerg
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: The SHA1 replacement plan

Josef 'Jeff' Sipek
On Tue, Jul 28, 2020 at 22:31:51 +0200, Joerg Sonnenberger wrote:

> On Tue, Jul 28, 2020 at 03:55:55PM -0400, Josef 'Jeff' Sipek wrote:
> > On Tue, Jul 28, 2020 at 21:35:51 +0200, Joerg Sonnenberger wrote:
> > > On Tue, Jul 28, 2020 at 03:02:46PM -0400, Josef 'Jeff' Sipek wrote:
> > > > On Sun, Jul 26, 2020 at 18:26:51 +0200, Joerg Sonnenberger wrote:
> > > > ...
> > > > > I've attached basic benchmark numbers below. The asm variant is using
> > > > > whatever my Threadripper supports in terms of low-level primitives, e.g.
> > > > > AVX2 and the SHA extension, either from OpenSSL (BLAKE2, SHA2, SHA3) or
> > > > > the reference implementations (K12, BLAKE3, BLAKE3*). Test case was
> > > > > hashing a large file (~7GB).
> > > >
> > > > While these performance measurements are important, it is also important to
> > > > make sure that older (or less "top of the line") hardware isn't completely
> > > > terrible.  For example, it is completely reasonable (at least IMO) to still
> > > > use a Sandy Bridge-era CPUs.  Likewise, it is reasonable to run hg on a
> > > > embedded system (although those tend to have wimpy I/O as well).
> > >
> > > Yeah, that's why I included the C variant.
> >
> > I don't trust compilers *not* to do some massive amount of optimization
> > unless they are told to target an older CPU.  Also, newer CPUs like to do a
> > lot of "magic" to speed things up and keep security researchers employed ;)
>
> The core of the hash functions isn't by itself very friendly to compiler
> optimisations. It's more a case of how bad the automatic code generation
> will be.

In general, yes you are correct.  However, it is a big loop and a compiler
can do a decent amount of pipelining.

> > > Just to establish that baseline:
> > >
> > > SHA1 (asm) 4.8s
> > > SHA1 (C)   10.7s
> > >
> > > So K12 is somewhat slower on a Threadripper, but should be somewhat
> > > faster than hardware without specific acceleration. SHA support on the
> > > Zen1 Threadripper is quite fast.
> >
> > I think we're in agreement.  The new algo shouldn't be much worse than the
> > existing SHA1.
> >
> > For the record: when the time comes, I'm willing to collect some hash perf
> > data on slightly older/weaker hw as a sanity check.
>
> If you have a modern OpenSSL version, you can get the numbers for
> sha256, sha3-256, blake2b512 and blake2s256 easily. K12 for non-vector
> CPUs or short messages can be reasonable approximated as "half of
> sha3-256 time" as that's the primary difference. BLAKE3 is the only one
> that would be more tricky.

I couldn't help myself and I ran it on 3 of my systems.  The results
are...interesting.


Thinkpad T520 (2011 vintage Sandy Bridge i7) with FreeBSD 12:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha1             38552.35k   112781.34k   256808.02k   375173.76k   433724.00k   438686.07k
sha256           26402.90k    67908.11k   131459.19k   171880.98k   188168.04k   189348.31k
sha3-256         13828.62k    55332.70k   130879.40k   152721.79k   169003.69k   170909.80k
blake2b512       23551.38k    94267.47k   240451.58k   311282.76k   342076.92k   343736.32k
blake2s256       27319.90k   107783.29k   162617.17k   187314.17k   195500.87k   196625.70k

Server (2009 vintage Xeon) running OmniOS:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha1             34924.79k   109450.16k   269382.74k   425380.18k   508193.55k   517892.78k
sha256           25028.82k    67870.78k   142345.81k   195099.15k   218278.57k   220632.41k
sha3-256         14281.78k    57115.29k   140363.97k   172130.99k   195810.65k   199223.98k
blake2b512       24261.26k   101366.74k   286062.42k   433886.55k   514296.49k   519682.56k
blake2s256       28521.81k   113023.49k   212133.77k   275259.33k   301708.63k   304365.57k

Raspberry Pi 1B+ running FreeBSD 12:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha1              1084.15k     3711.04k    10173.97k    18059.62k    23704.69k    23738.60k
sha256             942.90k     3105.90k     7843.36k    12770.46k    16318.84k    16399.04k
sha3-256           456.77k     1805.49k     4432.56k     5607.01k     6652.73k     6715.47k
blake2b512         173.38k      694.93k     1646.80k     2107.62k     2320.81k     2328.24k
blake2s256         861.31k     3420.77k     7963.73k    12459.78k    15746.41k    15978.57k


The difference between between blake2b512 and blake2s256 on the Raspberry Pi
is huge.  Overall, blake2s256 seems to be on par or better than sha256.
blake2b512 is pretty good on the 64-bit systems, but completely sucks on the
Pi.  Picking the right algo will be "fun".

Anyway, sorry for the "distaction" and thanks again for working on this.

Jeff.

--
Ready; T=0.01/0.01 17:29:33
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: The SHA1 replacement plan [Hash function part]

Augie Fackler-2
In reply to this post by Joerg Sonnenberger


On Jul 28, 2020, at 16:23, Joerg Sonnenberger <[hidden email]> wrote:

On Tue, Jul 28, 2020 at 03:49:58PM -0400, Augie Fackler wrote:


On Jul 28, 2020, at 15:29, Joerg Sonnenberger <[hidden email]> wrote:

On Tue, Jul 28, 2020 at 02:31:09PM -0400, Augie Fackler wrote:
The second most widely supported hash function would be BLAKE2s.

I've been strongly favoring blake2b for years now. Why prefer s over b?

Performance on 32bit platforms of blake2b is ...bad. blake2s works
reasonable well on both. It's also one of the reasons to prefer the
successors as they don't have the same problems.

Why do we care one iota about 32 bit CPUs? Not trolling, sincere: even
Raspberry Pi machines are now 64-bit, so I'm pretty unsympathetic to
using a much-less-vetted hash function to help out obsolete hardware.

They use pretty much the same core design, just different "tuning"
choices. There is little evidence that one is more secure than the
other. 32bit CPUs are still quite popular, but that's a separate
discussion.

I'm open to opt-in support for blake2s, but I strongly feel the default in the future should be blake2b: it's faster on 64-bit cores than 2s, and honestly having a choice here will probably help us have a more generalized hash-selection mechanism in hg so that when the _next_ hash crisis hits we just push a button and we're done.


There is a C implementation of BLAKE3 as well. There are valid concerns
about the round reduction and I'm not sure if we want to go with a
custom variant. That speaks more in favor of K12. K12 can use the ARM
SHA3 acceleration features.

I very much want to stay on the "beaten path" of hash functions and as
a hash function enthusiast you've introduced me to K12, which is a
strike against it IMO.

K12 is pretty much on the beaten path. It is essentiall for the SHA3
what BLAKE2 is for BLAKE, a reduction of the number of rounds for the
hash function and a tree hash mode to allow sponging blocks in parallel.

My crypto expert at work recommended against K12 because nobody else is using it, so it's largely "being different" for no substantial benefit.

So, how do you feel about "blake2b by default, but it's supported to pick 2s instead"?


Joerg


_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: The SHA1 replacement plan [Hash function part]

Joerg Sonnenberger
On Wed, Aug 05, 2020 at 03:24:48PM -0400, Augie Fackler wrote:

>
>
> > On Jul 28, 2020, at 16:23, Joerg Sonnenberger <[hidden email]> wrote:
> >
> > On Tue, Jul 28, 2020 at 03:49:58PM -0400, Augie Fackler wrote:
> >>
> >>
> >>> On Jul 28, 2020, at 15:29, Joerg Sonnenberger <[hidden email]> wrote:
> >>>
> >>> On Tue, Jul 28, 2020 at 02:31:09PM -0400, Augie Fackler wrote:
> >>>>> The second most widely supported hash function would be BLAKE2s.
> >>>>
> >>>> I've been strongly favoring blake2b for years now. Why prefer s over b?
> >>>
> >>> Performance on 32bit platforms of blake2b is ...bad. blake2s works
> >>> reasonable well on both. It's also one of the reasons to prefer the
> >>> successors as they don't have the same problems.
> >>
> >> Why do we care one iota about 32 bit CPUs? Not trolling, sincere: even
> >> Raspberry Pi machines are now 64-bit, so I'm pretty unsympathetic to
> >> using a much-less-vetted hash function to help out obsolete hardware.
> >
> > They use pretty much the same core design, just different "tuning"
> > choices. There is little evidence that one is more secure than the
> > other. 32bit CPUs are still quite popular, but that's a separate
> > discussion.
>
> I'm open to opt-in support for blake2s, but I strongly feel the default
> in the future should be blake2b: it's faster on 64-bit cores than 2s,
> and honestly having a choice here will probably help us have a more
> generalized hash-selection mechanism in hg so that when the _next_ hash
> crisis hits we just push a button and we're done.

Sure, I can live with that choice. The primary concern is and was making
an informed decision and not just picking the first available choice.

> So, how do you feel about "blake2b by default, but it's supported to pick 2s instead"?

Good enough for me. Do we care about Python 2 support on Windows for
this? That's likely the only target that doesn't use OpenSSL 1.1 or
newer Python, either of them would support blake2b out of the box.

Joerg
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: The SHA1 replacement plan [Hash function part]

Augie Fackler-2
On phone, sorry for top post. I'd say making new hashes require python 3 is eminently reasonable at this point. 

On Wed, Aug 5, 2020, 16:10 Joerg Sonnenberger <[hidden email]> wrote:
On Wed, Aug 05, 2020 at 03:24:48PM -0400, Augie Fackler wrote:
>
>
> > On Jul 28, 2020, at 16:23, Joerg Sonnenberger <[hidden email]> wrote:
> >
> > On Tue, Jul 28, 2020 at 03:49:58PM -0400, Augie Fackler wrote:
> >>
> >>
> >>> On Jul 28, 2020, at 15:29, Joerg Sonnenberger <[hidden email]> wrote:
> >>>
> >>> On Tue, Jul 28, 2020 at 02:31:09PM -0400, Augie Fackler wrote:
> >>>>> The second most widely supported hash function would be BLAKE2s.
> >>>>
> >>>> I've been strongly favoring blake2b for years now. Why prefer s over b?
> >>>
> >>> Performance on 32bit platforms of blake2b is ...bad. blake2s works
> >>> reasonable well on both. It's also one of the reasons to prefer the
> >>> successors as they don't have the same problems.
> >>
> >> Why do we care one iota about 32 bit CPUs? Not trolling, sincere: even
> >> Raspberry Pi machines are now 64-bit, so I'm pretty unsympathetic to
> >> using a much-less-vetted hash function to help out obsolete hardware.
> >
> > They use pretty much the same core design, just different "tuning"
> > choices. There is little evidence that one is more secure than the
> > other. 32bit CPUs are still quite popular, but that's a separate
> > discussion.
>
> I'm open to opt-in support for blake2s, but I strongly feel the default
> in the future should be blake2b: it's faster on 64-bit cores than 2s,
> and honestly having a choice here will probably help us have a more
> generalized hash-selection mechanism in hg so that when the _next_ hash
> crisis hits we just push a button and we're done.

Sure, I can live with that choice. The primary concern is and was making
an informed decision and not just picking the first available choice.

> So, how do you feel about "blake2b by default, but it's supported to pick 2s instead"?

Good enough for me. Do we care about Python 2 support on Windows for
this? That's likely the only target that doesn't use OpenSSL 1.1 or
newer Python, either of them would support blake2b out of the box.

Joerg

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel