Troubleshooting SHA1 Failures with Mercurial Repositories

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Troubleshooting SHA1 Failures with Mercurial Repositories

Paul Boddie
Hello,

I have very recently retired a rather old computer that has been my main
development machine for a very long time, but in the last few months it has
exhibited some unreliable behaviour in various respects. There is probably an
interesting detective story here somewhere, and I welcome insights into the
underlying system issues, but my motivation for sending this message is
obviously to assess the impacts on my Mercurial repositories.

(To skip the background, just skip the next three paragraphs!)

The cause of this unreliable behaviour became more apparent when obtaining DVD
images to use with the new computer that will now become my development
machine. Upon running md5sum, sha1sum, sha256sum or sha512sum on the
downloaded DVD image files, it was almost impossible to generate the correct
digests. Moreover, the digests were typically different on each invocation of
the chosen program on the same file, producing something new each time. And
yet, two separately downloaded copies of the same file would compare (using
the cmp program) and be shown to be identical!

Diagnosis of the situation involved writing fairly simple programs to generate
large files with predictable but varying content and then reading them back,
which seemed to yield the expected content each time. I also investigated
other message digest tools and found that the Java-based jacksum tool did
function more reliably with MD5 digests but not all of the time. OpenSSL-based
tools did not fare any better than those which presumably use the C library
digest functions. I ran memtest86+ for some time without any indication of
memory failure, and there was no obvious indication of disk failure, although
I shall aim to run more extensive smartctl tests to be sure.

Generally, I have not experienced obvious problems with my data, but I have
experienced frustration with distribution updates (Debian's apt complaining
about hash sum mismatches) and it has been largely impossible to clone large
Git repositories ("index pack failed"), although I assumed that this was just
Git making increasing demands on system capabilities (and being typically
unhelpful). I doubt that anyone else runs hardware this old - Pentium 4, 3.0
GHz, "Prescott" generation - and support for 32-bit x86 is gradually
disappearing, so I don't know what level of experience other people are likely
to have with these issues (other than remarks about the system being old and
needing replacement).

(Here comes the bit specifically related to Mercurial.)

Anyway, I find myself with Mercurial repositories that I have been updating
during periods of unreliability. On practically no occasion (or not recently,
and then maybe once) have I had a problem updating or accessing repositories,
but I wondered what kind of effects this unreliability might have had on
repository integrity. The Mercurial Wiki and other documentation does not
readily explain the implications of faulty digests, although I found the
following interesting remarks:

"The repository owner may continue committing to the heads of the repository,
but attempts to view the repository at any changeset containing the sensitive
file data will fail due to the hash mismatch (examples: hg update, hg diff, hg
annotate). "hg verify" will fail due to the hash mismatch as well."

https://www.mercurial-scm.org/wiki/CensorPlan

Now, having copied repositories to my new machine, I have successfully
verified the repositories using hg verify. However, using hg convert reveals
differing nodeids between the original and converted repositories. I have
tried hg convert with both --branchsort and --sourcesort options. Then, I have
generated readily comparable logs as follows:

hg log --template '{node}\n' > logfile

Running diff on the logs for the original and converted repositories reveals
considerable differences in nodeids for some repositories, even ones which
haven't been touched in years, but no differences for others. It appears that
--sourcesort replicates history more accurately (as suggested by the
documentation). For validation, converting the converted repositories again
(using --sourcesort) produces identical histories, as one might expect.

I suppose I am left wondering about a few things. Are such simple comparisons
of repository histories useful in assessing the prevalence of faulty nodeids?
How may faulty nodeids affect the integrity of repositories (considering the
quote about censored changesets above)? Are there any compelling practical
arguments for converting these faulty repositories if they otherwise function
apparently normally? (I realise that combining faulty and converted
repositories will result in divergence in the graph at inappropriate places.)

Sorry for the long message, but any insights would be much appreciated!

Paul


_______________________________________________
Mercurial mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting SHA1 Failures with Mercurial Repositories

Augie Fackler-2


> On Jun 13, 2020, at 9:24 AM, Paul Boddie <[hidden email]> wrote:
>
> Hello,
>
> I have very recently retired a rather old computer that has been my main
> development machine for a very long time, but in the last few months it has
> exhibited some unreliable behaviour in various respects. There is probably an
> interesting detective story here somewhere, and I welcome insights into the
> underlying system issues, but my motivation for sending this message is
> obviously to assess the impacts on my Mercurial repositories.
>
> (To skip the background, just skip the next three paragraphs!)
>
> The cause of this unreliable behaviour became more apparent when obtaining DVD
> images to use with the new computer that will now become my development
> machine. Upon running md5sum, sha1sum, sha256sum or sha512sum on the
> downloaded DVD image files, it was almost impossible to generate the correct
> digests. Moreover, the digests were typically different on each invocation of
> the chosen program on the same file, producing something new each time. And
> yet, two separately downloaded copies of the same file would compare (using
> the cmp program) and be shown to be identical!
>
> Diagnosis of the situation involved writing fairly simple programs to generate
> large files with predictable but varying content and then reading them back,
> which seemed to yield the expected content each time. I also investigated
> other message digest tools and found that the Java-based jacksum tool did
> function more reliably with MD5 digests but not all of the time. OpenSSL-based
> tools did not fare any better than those which presumably use the C library
> digest functions. I ran memtest86+ for some time without any indication of
> memory failure, and there was no obvious indication of disk failure, although
> I shall aim to run more extensive smartctl tests to be sure.
>
> Generally, I have not experienced obvious problems with my data, but I have
> experienced frustration with distribution updates (Debian's apt complaining
> about hash sum mismatches) and it has been largely impossible to clone large
> Git repositories ("index pack failed"), although I assumed that this was just
> Git making increasing demands on system capabilities (and being typically
> unhelpful). I doubt that anyone else runs hardware this old - Pentium 4, 3.0
> GHz, "Prescott" generation - and support for 32-bit x86 is gradually
> disappearing, so I don't know what level of experience other people are likely
> to have with these issues (other than remarks about the system being old and
> needing replacement).
>
> (Here comes the bit specifically related to Mercurial.)
>
> Anyway, I find myself with Mercurial repositories that I have been updating
> during periods of unreliability. On practically no occasion (or not recently,
> and then maybe once) have I had a problem updating or accessing repositories,
> but I wondered what kind of effects this unreliability might have had on
> repository integrity. The Mercurial Wiki and other documentation does not
> readily explain the implications of faulty digests, although I found the
> following interesting remarks:
>
> "The repository owner may continue committing to the heads of the repository,
> but attempts to view the repository at any changeset containing the sensitive
> file data will fail due to the hash mismatch (examples: hg update, hg diff, hg
> annotate). "hg verify" will fail due to the hash mismatch as well."
>
> https://www.mercurial-scm.org/wiki/CensorPlan
>
> Now, having copied repositories to my new machine, I have successfully
> verified the repositories using hg verify. However, using hg convert reveals
> differing nodeids between the original and converted repositories. I have
> tried hg convert with both --branchsort and --sourcesort options. Then, I have
> generated readily comparable logs as follows:
>
> hg log --template '{node}\n' > logfile
>
> Running diff on the logs for the original and converted repositories reveals
> considerable differences in nodeids for some repositories, even ones which
> haven't been touched in years, but no differences for others. It appears that
> --sourcesort replicates history more accurately (as suggested by the
> documentation). For validation, converting the converted repositories again
> (using --sourcesort) produces identical histories, as one might expect.
>
> I suppose I am left wondering about a few things. Are such simple comparisons
> of repository histories useful in assessing the prevalence of faulty nodeids?
> How may faulty nodeids affect the integrity of repositories (considering the
> quote about censored changesets above)? Are there any compelling practical
> arguments for converting these faulty repositories if they otherwise function
> apparently normally? (I realise that combining faulty and converted
> repositories will result in divergence in the graph at inappropriate places.)

Over the years we’ve gotten a lot pickier about ordering of metadata in changeset objects we produce. My guess is that if the original repo passes `hg verify` nothing is wrong in the source repo, and that the differences you’re seeing are entirely metadata-ordering related (which is to say harmless).

Was there any specific thing that motivated using `hg convert`?

>
> Sorry for the long message, but any insights would be much appreciated!
>
> Paul
>
>
> _______________________________________________
> Mercurial mailing list
> [hidden email]
> https://www.mercurial-scm.org/mailman/listinfo/mercurial

_______________________________________________
Mercurial mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting SHA1 Failures with Mercurial Repositories

Paul Boddie
On Saturday, 13 June 2020 22:31:06 CEST Augie Fackler wrote:

> > On Jun 13, 2020, at 9:24 AM, Paul Boddie <[hidden email]> wrote:
> >
> > I suppose I am left wondering about a few things. Are such simple
> > comparisons of repository histories useful in assessing the prevalence of
> > faulty nodeids? How may faulty nodeids affect the integrity of
> > repositories (considering the quote about censored changesets above)? Are
> > there any compelling practical arguments for converting these faulty
> > repositories if they otherwise function apparently normally? (I realise
> > that combining faulty and converted repositories will result in
> > divergence in the graph at inappropriate places.)
>
> Over the years we’ve gotten a lot pickier about ordering of metadata in
> changeset objects we produce. My guess is that if the original repo passes
> `hg verify` nothing is wrong in the source repo, and that the differences
> you’re seeing are entirely metadata-ordering related (which is to say
> harmless).

So is the page about censored changesets now inaccurate with regard to nodeids
causing some kind of failure if they do not "encode" the stored content
according to the fundamental rules of Mercurial? Or did I misunderstand the
intended message of that text? It sounds like there is nothing corrupt with
regard to the stored content, merely the metadata (which happened to be used
to construct the history initially) that is corrupt in some way.

> Was there any specific thing that motivated using `hg convert`?

I think the wiki mentions it as a tool to investigate repository corruption.
My reasoning was that repository conversion using "hg convert" would rebuild
the history and recompute the nodeids. In doing so in an environment where
SHA1 libraries are not generating something arbitrary, I figured that I would
obtain the "true" nodeids and see where they diverged in the original history
from what they should have been.

Obviously, if there is a better way of "replaying" the history to see where
and when the nodeids became bad, I would like to hear about it.

Thanks for the reply!

Paul


_______________________________________________
Mercurial mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting SHA1 Failures with Mercurial Repositories

Augie Fackler-2


> On Jun 13, 2020, at 4:47 PM, Paul Boddie <[hidden email]> wrote:
>
> On Saturday, 13 June 2020 22:31:06 CEST Augie Fackler wrote:
>>> On Jun 13, 2020, at 9:24 AM, Paul Boddie <[hidden email]> wrote:
>>>
>>> I suppose I am left wondering about a few things. Are such simple
>>> comparisons of repository histories useful in assessing the prevalence of
>>> faulty nodeids? How may faulty nodeids affect the integrity of
>>> repositories (considering the quote about censored changesets above)? Are
>>> there any compelling practical arguments for converting these faulty
>>> repositories if they otherwise function apparently normally? (I realise
>>> that combining faulty and converted repositories will result in
>>> divergence in the graph at inappropriate places.)
>>
>> Over the years we’ve gotten a lot pickier about ordering of metadata in
>> changeset objects we produce. My guess is that if the original repo passes
>> `hg verify` nothing is wrong in the source repo, and that the differences
>> you’re seeing are entirely metadata-ordering related (which is to say
>> harmless).
>
> So is the page about censored changesets now inaccurate with regard to nodeids
> causing some kind of failure if they do not "encode" the stored content
> according to the fundamental rules of Mercurial? Or did I misunderstand the
> intended message of that text? It sounds like there is nothing corrupt with
> regard to the stored content, merely the metadata (which happened to be used
> to construct the history initially) that is corrupt in some way.

More nuanced than that, actually. The censor page is only relevant if you’ve used censor, which you’d know. As I said, if you’ve got a repo passing `hg verify` then it’s definitely _not_ corrupt. The metadata ordering can change and that’ll (by nature of a content-addressed system) change the node ID, but the content is the same. Eg

{‘branch’: ‘foo’, ‘rebase_src’: ’some_hash_here’}
{‘rebase_src’: ’some_hash_here’, ‘branch’: ‘foo’}

are the same key-vale pairs, but in different order. If they are stored in hg in different orders, you’ll get different hashes, and if you use `hg convert` on an older repository the ordering of key/value pairs in various metadata regions will get normalized, which changes hashes.

>
>> Was there any specific thing that motivated using `hg convert`?
>
> I think the wiki mentions it as a tool to investigate repository corruption.
> My reasoning was that repository conversion using "hg convert" would rebuild
> the history and recompute the nodeids. In doing so in an environment where
> SHA1 libraries are not generating something arbitrary, I figured that I would
> obtain the "true" nodeids and see where they diverged in the original history
> from what they should have been.

Ah. Convert is a useful tool for recovering from corrupt repos, but it doesn’t sound like you’ve got any.

>
> Obviously, if there is a better way of "replaying" the history to see where
> and when the nodeids became bad, I would like to hear about it.
>
> Thanks for the reply!
>
> Paul
>
>

_______________________________________________
Mercurial mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting SHA1 Failures with Mercurial Repositories

Paul Boddie
On Saturday, 13 June 2020 22:53:19 CEST Augie Fackler wrote:
>
> More nuanced than that, actually. The censor page is only relevant if you’ve
> used censor, which you’d know.

I wasn't sure about this, due to the phrasing which, in various cases, sounds
as if it is referencing general Mercurial design principles and behaviours.
Plus scary wording that I thought might apply in my own situation, such as...

  "hg verify" will fail due to the hash mismatch as well.

> As I said, if you’ve got a repo passing `hg verify` then it’s definitely
> _not_ corrupt. The metadata ordering can change and that’ll (by nature of a
> content-addressed system) change the node ID, but the content is the same.
> Eg
>
> {‘branch’: ‘foo’, ‘rebase_src’: ’some_hash_here’}
> {‘rebase_src’: ’some_hash_here’, ‘branch’: ‘foo’}
>
> are the same key-vale pairs, but in different order. If they are stored in
> hg in different orders, you’ll get different hashes, and if you use `hg
> convert` on an older repository the ordering of key/value pairs in various
> metadata regions will get normalized, which changes hashes.

So, I imagine that this normalisation might explain why some repositories
diverge from their newly-converted form very early in their histories, well
before I can believe that this problem with digest generation might have
arisen.

[...]

> Ah. Convert is a useful tool for recovering from corrupt repos, but it
> doesn’t sound like you’ve got any.

This is reassuring to learn. I imagine that there is little to be gained from
recomputing any faulty digests, given that their role, faulty or otherwise, is
to provide convenient but opaque references that give the history its
structure.

Thanks once again for following up!

Paul

P.S. I did a "long" test with smartctl which yielded no errors on the disk
where my repositories were stored, so I am still inclined to think that the
disk itself is not the cause of these problems.


_______________________________________________
Mercurial mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial