rust hg status

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

rust hg status

Valentin Gatien-Baron-2
Hello,

I wrote a fraction of hg status in rust, just the minimum needed to
compare current revision and working copy with few of the flags and
config settings supported. As you can imagine, the goal was better
performance.  Before trying to upstream bits of this, I figured I'd
check there's interest for this change in particular, or this kind
of changes in general (I suspect rust would bring significant
improvements to hg cat or hg files). The rest of this mail is more
details.

While the implementation doesn't handle every uncommon situation right
and could use some serious cleanup, it's an interesting performance
improvement. In a repository with 100k tracked files and 500k ignored
files, in the best case and measuring on a good machine:

- hg-rs st takes ~50ms
- hg-rs st -mard takes ~14ms
- hg-rs st -u takes ~39ms

By contrast, hg+chg+fsmonitor's best case is 110ms regardless of
flags. Without fsmonitor, we're talking about 2.4s for hg st or hg st
-u, and 400ms for hg st -mard. As a baseline, hg st --syntax-error
takes 12ms.

A ratio of x2 compared with fsmonitor+chg is nice, but while neither
best case is what you get all the time, fsmonitor degrades pretty
badly, oftentimes in hard to understand ways, making for an
unpredictable experience that is frequently bad.
Say you change the hgignore, the rust version will take 300ms, the
fsmonitor version will take 4.4s (I think 2s timeout + 2.4s regular
status).
Say you remove a directory at the root of the repository, 50ms rust
vs 4.4s fsmonitor.
Say you haven't used a particular share in some time, you may well see
1s rust vs 4.4s fsmonitor.

So I think there's a lot of value in having status without fsmonitor
going much faster:
- increase significantly the scale at which fsmonitor is needed
- improve the bad cases of fsmonitor (or even the fast path depending
on how things are made to work together)

Regards,

Valentin Gatien-Baron


_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: rust hg status

Augie Fackler-2
On Fri, Feb 15, 2019 at 02:39:44PM -0500, Valentin Gatien-Baron wrote:

> Hello,
>
> I wrote a fraction of hg status in rust, just the minimum needed to
> compare current revision and working copy with few of the flags and
> config settings supported. As you can imagine, the goal was better
> performance.  Before trying to upstream bits of this, I figured I'd
> check there's interest for this change in particular, or this kind
> of changes in general (I suspect rust would bring significant
> improvements to hg cat or hg files). The rest of this mail is more
> details.

This sounds _very_ promising and I'd love to see what you've got!

>
> While the implementation doesn't handle every uncommon situation right
> and could use some serious cleanup, it's an interesting performance
> improvement. In a repository with 100k tracked files and 500k ignored
> files, in the best case and measuring on a good machine:
>
> - hg-rs st takes ~50ms
> - hg-rs st -mard takes ~14ms
> - hg-rs st -u takes ~39ms
>
> By contrast, hg+chg+fsmonitor's best case is 110ms regardless of
> flags. Without fsmonitor, we're talking about 2.4s for hg st or hg st
> -u, and 400ms for hg st -mard. As a baseline, hg st --syntax-error
> takes 12ms.

Fascinating! Are you using re2 or Python's built-in re?

>
> A ratio of x2 compared with fsmonitor+chg is nice, but while neither
> best case is what you get all the time, fsmonitor degrades pretty
> badly, oftentimes in hard to understand ways, making for an
> unpredictable experience that is frequently bad.
> Say you change the hgignore, the rust version will take 300ms, the
> fsmonitor version will take 4.4s (I think 2s timeout + 2.4s regular
> status).
> Say you remove a directory at the root of the repository, 50ms rust
> vs 4.4s fsmonitor.
> Say you haven't used a particular share in some time, you may well see
> 1s rust vs 4.4s fsmonitor.
>
> So I think there's a lot of value in having status without fsmonitor
> going much faster:
> - increase significantly the scale at which fsmonitor is needed
> - improve the bad cases of fsmonitor (or even the fast path depending
> on how things are made to work together)
>
> Regards,
>
> Valentin Gatien-Baron

> _______________________________________________
> Mercurial-devel mailing list
> [hidden email]
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: rust hg status

Valentin Gatien-Baron-2


On Tue, Feb 19, 2019 at 10:46 AM Augie Fackler <[hidden email]> wrote:
On Fri, Feb 15, 2019 at 02:39:44PM -0500, Valentin Gatien-Baron wrote:
> Hello,
>
> I wrote a fraction of hg status in rust, just the minimum needed to
> compare current revision and working copy with few of the flags and
> config settings supported. As you can imagine, the goal was better
> performance.  Before trying to upstream bits of this, I figured I'd
> check there's interest for this change in particular, or this kind
> of changes in general (I suspect rust would bring significant
> improvements to hg cat or hg files). The rest of this mail is more
> details.

This sounds _very_ promising and I'd love to see what you've got!

Cool!
Seeing my mail again, it's perhaps not clearly said that what I have and what I timed below is a fully rust exe that implements a fraction of hg status, not a change to python hg that uses big chunks of rust some fraction of the time. Though it seems that upstreaming would take the latter approach, at least to start with.
 

>
> While the implementation doesn't handle every uncommon situation right
> and could use some serious cleanup, it's an interesting performance
> improvement. In a repository with 100k tracked files and 500k ignored
> files, in the best case and measuring on a good machine:
>
> - hg-rs st takes ~50ms
> - hg-rs st -mard takes ~14ms
> - hg-rs st -u takes ~39ms
>
> By contrast, hg+chg+fsmonitor's best case is 110ms regardless of
> flags. Without fsmonitor, we're talking about 2.4s for hg st or hg st
> -u, and 400ms for hg st -mard. As a baseline, hg st --syntax-error
> takes 12ms.

Fascinating! Are you using re2 or Python's built-in re?

Definitely using re2. If I disable re2, the full status goes from 2.4s to 5.7s. I didn't say how the rust implementation differs from the python version, but using rust+re2 is not enough to get to 40ms for finding unknown files. In addition to optimizations to the hgignore handling (mostly special treatment of globs that can match exactly one file), and parallelism, and not pointlessly lstat'ing untracked files in filesystems that provide the filetype in readdir, there's a cache that holds a list of "this directory is known to have no untracked files assuming it has this timestamp, and the hgignore is bla and the dirstate is bla", which usually shortcuts the listing of untracked files in most directories, and thus shortcuts applying the hgignore on such files. 
Though even when the cache fails to help, like when the hgignore changes, rust status takes 300ms (and it's quite plausible there's room for improvement here, I stopped optimizing when it felt like a good enough replacement).
 

>
> A ratio of x2 compared with fsmonitor+chg is nice, but while neither
> best case is what you get all the time, fsmonitor degrades pretty
> badly, oftentimes in hard to understand ways, making for an
> unpredictable experience that is frequently bad.
> Say you change the hgignore, the rust version will take 300ms, the
> fsmonitor version will take 4.4s (I think 2s timeout + 2.4s regular
> status).
> Say you remove a directory at the root of the repository, 50ms rust
> vs 4.4s fsmonitor.
> Say you haven't used a particular share in some time, you may well see
> 1s rust vs 4.4s fsmonitor.
>
> So I think there's a lot of value in having status without fsmonitor
> going much faster:
> - increase significantly the scale at which fsmonitor is needed
> - improve the bad cases of fsmonitor (or even the fast path depending
> on how things are made to work together)
>
> Regards,
>
> Valentin Gatien-Baron

> _______________________________________________
> Mercurial-devel mailing list
> [hidden email]
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: rust hg status

Augie Fackler-2


> On Feb 19, 2019, at 14:43, Valentin Gatien-Baron <[hidden email]> wrote:
>
>
>
> On Tue, Feb 19, 2019 at 10:46 AM Augie Fackler <[hidden email]> wrote:
> On Fri, Feb 15, 2019 at 02:39:44PM -0500, Valentin Gatien-Baron wrote:
> > Hello,
> >
> > I wrote a fraction of hg status in rust, just the minimum needed to
> > compare current revision and working copy with few of the flags and
> > config settings supported. As you can imagine, the goal was better
> > performance.  Before trying to upstream bits of this, I figured I'd
> > check there's interest for this change in particular, or this kind
> > of changes in general (I suspect rust would bring significant
> > improvements to hg cat or hg files). The rest of this mail is more
> > details.
>
> This sounds _very_ promising and I'd love to see what you've got!
>
> Cool!
> Seeing my mail again, it's perhaps not clearly said that what I have and what I timed below is a fully rust exe that implements a fraction of hg status, not a change to python hg that uses big chunks of rust some fraction of the time. Though it seems that upstreaming would take the latter approach, at least to start with.

I'm still interested in both - we've talked on and off about a small native-binary helper for things like printing information that people like in their shell prompt.

>  
>
> >
> > While the implementation doesn't handle every uncommon situation right
> > and could use some serious cleanup, it's an interesting performance
> > improvement. In a repository with 100k tracked files and 500k ignored
> > files, in the best case and measuring on a good machine:
> >
> > - hg-rs st takes ~50ms
> > - hg-rs st -mard takes ~14ms
> > - hg-rs st -u takes ~39ms
> >
> > By contrast, hg+chg+fsmonitor's best case is 110ms regardless of
> > flags. Without fsmonitor, we're talking about 2.4s for hg st or hg st
> > -u, and 400ms for hg st -mard. As a baseline, hg st --syntax-error
> > takes 12ms.
>
> Fascinating! Are you using re2 or Python's built-in re?
>
> Definitely using re2. If I disable re2, the full status goes from 2.4s to 5.7s. I didn't say how the rust implementation differs from the python version, but using rust+re2 is not enough to get to 40ms for finding unknown files. In addition to optimizations to the hgignore handling (mostly special treatment of globs that can match exactly one file), and parallelism, and not pointlessly lstat'ing untracked files in filesystems that provide the filetype in readdir, there's a cache that holds a list of "this directory is known to have no untracked files assuming it has this timestamp, and the hgignore is bla and the dirstate is bla", which usually shortcuts the listing of untracked files in most directories, and thus shortcuts applying the hgignore on such files.
> Though even when the cache fails to help, like when the hgignore changes, rust status takes 300ms (and it's quite plausible there's room for improvement here, I stopped optimizing when it felt like a good enough replacement).

Wow, even more impressive.

>  
>
> >
> > A ratio of x2 compared with fsmonitor+chg is nice, but while neither
> > best case is what you get all the time, fsmonitor degrades pretty
> > badly, oftentimes in hard to understand ways, making for an
> > unpredictable experience that is frequently bad.
> > Say you change the hgignore, the rust version will take 300ms, the
> > fsmonitor version will take 4.4s (I think 2s timeout + 2.4s regular
> > status).
> > Say you remove a directory at the root of the repository, 50ms rust
> > vs 4.4s fsmonitor.
> > Say you haven't used a particular share in some time, you may well see
> > 1s rust vs 4.4s fsmonitor.
> >
> > So I think there's a lot of value in having status without fsmonitor
> > going much faster:
> > - increase significantly the scale at which fsmonitor is needed
> > - improve the bad cases of fsmonitor (or even the fast path depending
> > on how things are made to work together)
> >
> > Regards,
> >
> > Valentin Gatien-Baron
>
> > _______________________________________________
> > Mercurial-devel mailing list
> > [hidden email]
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: rust hg status

Pierre-Yves David-2
In reply to this post by Augie Fackler-2


On 2/19/19 4:46 PM, Augie Fackler wrote:

> On Fri, Feb 15, 2019 at 02:39:44PM -0500, Valentin Gatien-Baron wrote:
>> Hello,
>>
>> I wrote a fraction of hg status in rust, just the minimum needed to
>> compare current revision and working copy with few of the flags and
>> config settings supported. As you can imagine, the goal was better
>> performance.  Before trying to upstream bits of this, I figured I'd
>> check there's interest for this change in particular, or this kind
>> of changes in general (I suspect rust would bring significant
>> improvements to hg cat or hg files). The rest of this mail is more
>> details.
>
> This sounds _very_ promising and I'd love to see what you've got!

Valentin's experiment is visible in Octobus' development repository:

https://mirror.octobus.net/octobus/mercurial-devel/log?rev=c7af02%3A%3A

Valentin initial goal was to produce a proof of concept. Now that this
experiment is a clear success (as you noticed), we (Octobus) are looking
into turning it into more production ready code upstream. Raphaël Gomès
started working on that (with Georges' help). He should start sending
patches about it in the coming weeks.

Cheers,

--
Pierre-Yves David

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel