Quantcast

RFC: bitmap storage for precursors and phases

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RFC: bitmap storage for precursors and phases

Stanislau Hlebik

Last year Durham sent a proposal for bitmap storage for computehidden() https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-September/087860.html.

 

It got a few useful comments, two most important comments:

  1. Use bitmaps for lower-level data structures, for example, bitmap for public commits and bitmap for commits that are affected by obsolescence markers
  2. Add cache validation checks

 

This is a new RFC that addresses these issues.

  1. In the code, there is a cache only for precursors. Later I'm planning to add cache for public commits.
  2. Cache validation key was added. It can be any key. For precursorcache I've chosen obsstore file size and mtime and cache validation key. For phasecache we can use hash of sorted phaseroots file since this file is usually small. 

 

 

In a few places, I decided to delete cache entirely:

  1. Caches are blown up if we receive obsmarkers via bundle2 or exchange._pullobsolete(). This is not necessary since cache won't be valid on the next run anyway because obsstore size and/or mtime was changed. We can change it.
  2. Obsstore can store markers for node that are not present in the repo. Since bitmap is basically a dict from rev to boolean it can't store markers for revisions that are not in the repo. It doesn't cause issues unless missing node is received from bundle or during pull. In this case precursorcache doesn't recognize it as obsoleted. To fix it cache is deleted during changegroup.apply() if we receive a node that is in obsstore.successors dict.
  3. Similar thing in bundlerepo - new obsoleted nodes may have been added, let's just not use precursorcache in bundlerepo.

 

 

The code is here: https://bitbucket.org/stashlebik/hg/commits/branch/hiddenbitmaps

It's not super clean yet, but all tests pass (except maybe for commit/code checks).

Please look and let me know what you think!

 

 

Thanks,

Stas


_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Jun Wu
In general, I think this is a good direction. Some random thoughts:

  - general purposed

    I think the bitmap is not always a cache, so it should only have
    operations like set/unset/readfromdisk/writetodisk. Practically, I won't
    couple cache invalidation with the bitmap implementation.

    In additional, I'll try to avoid using Python-only types in the
    interface. So once we decide to rewrite part of the implementation in
    native C, we won't have trouble.

    See "revset" below for a possibility that bitmap is used as a non-set.

  - revset

    This is a possibility that probably won't happen any time soon.

    The revset currently uses Python set for maintaining its state. For huge
    sets, Python sets may not be a good option. And various operations could
    benefit from an always-topologically-sorted set, which is the bitmap.

  - mmap
   
    My intuition is that bitmaps fit better with mmap which can reduce the
    reading loading cost. I think "vfs.mmapread" could be a thing, and
    various places can benefit from it - Gabor recently showed interest in
    loading revlog data by mmap, I had patches that uses mmap to read revlog
    index.

In additional, not directly related to this series, I'm a big fan of
single direction data flow. But the current code base does not seem to do a
good job in this area. As we are adding more caching layers to the code
base, it'd be nice if we have some tiny framework maintaining the dependency
of all kinds of data, to be able to understand the data flow easily, and
just to be more confident about loading orders. I think people more
experienced on architecture may want to share some ideas here.

Excerpts from Stanislau Hlebik's message of 2017-02-10 17:33:28 +0000:

> Last year Durham sent a proposal for bitmap storage for computehidden() https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-September/087860.html .
>
> It got a few useful comments, two most important comments:
>
>   1.  Use bitmaps for lower-level data structures, for example, bitmap for public commits and bitmap for commits that are affected by obsolescence markers
>   2.  Add cache validation checks
>
> This is a new RFC that addresses these issues.
>
>   1.  In the code, there is a cache only for precursors. Later I'm planning to add cache for public commits.
>   2.  Cache validation key was added. It can be any key. For precursorcache I've chosen obsstore file size and mtime and cache validation key. For phasecache we can use hash of sorted phaseroots file since this file is usually small.
>
>
> In a few places, I decided to delete cache entirely:
>
>   1.  Caches are blown up if we receive obsmarkers via bundle2 or exchange._pullobsolete(). This is not necessary since cache won't be valid on the next run anyway because obsstore size and/or mtime was changed. We can change it.
>   2.  Obsstore can store markers for node that are not present in the repo. Since bitmap is basically a dict from rev to boolean it can't store markers for revisions that are not in the repo. It doesn't cause issues unless missing node is received from bundle or during pull. In this case precursorcache doesn't recognize it as obsoleted. To fix it cache is deleted during changegroup.apply() if we receive a node that is in obsstore.successors dict.
>   3.  Similar thing in bundlerepo - new obsoleted nodes may have been added, let's just not use precursorcache in bundlerepo.
>
>
> The code is here: https://bitbucket.org/stashlebik/hg/commits/branch/hiddenbitmaps 
> It's not super clean yet, but all tests pass (except maybe for commit/code checks).
> Please look and let me know what you think!
>
>
> Thanks,
> Stas
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Stanislau Hlebik
Excerpts from Jun Wu's message of 2017-02-10 11:04:24 -0800:

> In general, I think this is a good direction. Some random thoughts:
>
>   - general purposed
>
>     I think the bitmap is not always a cache, so it should only have
>     operations like set/unset/readfromdisk/writetodisk. Practically, I won't
>     couple cache invalidation with the bitmap implementation.
>
>     In additional, I'll try to avoid using Python-only types in the
>     interface. So once we decide to rewrite part of the implementation in
>     native C, we won't have trouble.

I can decouple them into bitmap and bitmapcache. bitmap then need to
have an generic header. bitmapcache will store invalidation key there.

>
>     See "revset" below for a possibility that bitmap is used as a non-set.
>
>   - revset
>
>     This is a possibility that probably won't happen any time soon.
>
>     The revset currently uses Python set for maintaining its state. For huge
>     sets, Python sets may not be a good option. And various operations could
>     benefit from an always-topologically-sorted set, which is the bitmap.
>
>   - mmap
>    
>     My intuition is that bitmaps fit better with mmap which can reduce the
>     reading loading cost. I think "vfs.mmapread" could be a thing, and
>     various places can benefit from it - Gabor recently showed interest in
>     loading revlog data by mmap, I had patches that uses mmap to read revlog
>     index.

Bitmap files are going to be small and reading them all in memory should
be fast. But if it'll ever become a bottleneck we can add mmap read of
bitmap files.

>
> In additional, not directly related to this series, I'm a big fan of
> single direction data flow. But the current code base does not seem to do a
> good job in this area. As we are adding more caching layers to the code
> base, it'd be nice if we have some tiny framework maintaining the dependency
> of all kinds of data, to be able to understand the data flow easily, and
> just to be more confident about loading orders. I think people more
> experienced on architecture may want to share some ideas here.
>
> Excerpts from Stanislau Hlebik's message of 2017-02-10 17:33:28 +0000:
> > Last year Durham sent a proposal for bitmap storage for computehidden() https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-September/087860.html .
> >
> > It got a few useful comments, two most important comments:
> >
> >   1.  Use bitmaps for lower-level data structures, for example, bitmap for public commits and bitmap for commits that are affected by obsolescence markers
> >   2.  Add cache validation checks
> >
> > This is a new RFC that addresses these issues.
> >
> >   1.  In the code, there is a cache only for precursors. Later I'm planning to add cache for public commits.
> >   2.  Cache validation key was added. It can be any key. For precursorcache I've chosen obsstore file size and mtime and cache validation key. For phasecache we can use hash of sorted phaseroots file since this file is usually small.
> >
> >
> > In a few places, I decided to delete cache entirely:
> >
> >   1.  Caches are blown up if we receive obsmarkers via bundle2 or exchange._pullobsolete(). This is not necessary since cache won't be valid on the next run anyway because obsstore size and/or mtime was changed. We can change it.
> >   2.  Obsstore can store markers for node that are not present in the repo. Since bitmap is basically a dict from rev to boolean it can't store markers for revisions that are not in the repo. It doesn't cause issues unless missing node is received from bundle or during pull. In this case precursorcache doesn't recognize it as obsoleted. To fix it cache is deleted during changegroup.apply() if we receive a node that is in obsstore.successors dict.
> >   3.  Similar thing in bundlerepo - new obsoleted nodes may have been added, let's just not use precursorcache in bundlerepo.
> >
> >
> > The code is here: https://bitbucket.org/stashlebik/hg/commits/branch/hiddenbitmaps 
> > It's not super clean yet, but all tests pass (except maybe for commit/code checks).
> > Please look and let me know what you think!
> >
> >
> > Thanks,
> > Stas
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Sean Farley-3
In reply to this post by Jun Wu
Jun Wu <[hidden email]> writes:

> In general, I think this is a good direction. Some random thoughts:
>
>   - general purposed
>
>     I think the bitmap is not always a cache, so it should only have
>     operations like set/unset/readfromdisk/writetodisk. Practically, I won't
>     couple cache invalidation with the bitmap implementation.
>
>     In additional, I'll try to avoid using Python-only types in the
>     interface. So once we decide to rewrite part of the implementation in
>     native C, we won't have trouble.
>
>     See "revset" below for a possibility that bitmap is used as a non-set.
>
>   - revset
>
>     This is a possibility that probably won't happen any time soon.
>
>     The revset currently uses Python set for maintaining its state. For huge
>     sets, Python sets may not be a good option. And various operations could
>     benefit from an always-topologically-sorted set, which is the bitmap.
>
>   - mmap
>    
>     My intuition is that bitmaps fit better with mmap which can reduce the
>     reading loading cost. I think "vfs.mmapread" could be a thing, and
>     various places can benefit from it - Gabor recently showed interest in
>     loading revlog data by mmap, I had patches that uses mmap to read revlog
>     index.
>
> In additional, not directly related to this series, I'm a big fan of
> single direction data flow. But the current code base does not seem to do a
> good job in this area. As we are adding more caching layers to the code
> base, it'd be nice if we have some tiny framework maintaining the dependency
> of all kinds of data, to be able to understand the data flow easily, and
> just to be more confident about loading orders. I think people more
> experienced on architecture may want to share some ideas here.

I was thinking about a more high-level approach (please feel free to
pick apart):

r = repo.filtered("bitmap1")
r2 = r.filtered("bitmap2")

So that r2 would be an intersection of bitmap1 and bitmap2 (haven't
thought about a union nor the inverse).
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Jun Wu
Excerpts from Sean Farley's message of 2017-02-13 17:04:35 -0800:
> I was thinking about a more high-level approach (please feel free to
> pick apart):
>
> r = repo.filtered("bitmap1")
> r2 = r.filtered("bitmap2")
>
> So that r2 would be an intersection of bitmap1 and bitmap2 (haven't
> thought about a union nor the inverse).

That does not conflict with my comments. It could be implemented as nested
filters, or flatten the bitmap by doing an "or" operation.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Sean Farley-3
Jun Wu <[hidden email]> writes:

> Excerpts from Sean Farley's message of 2017-02-13 17:04:35 -0800:
>> I was thinking about a more high-level approach (please feel free to
>> pick apart):
>>
>> r = repo.filtered("bitmap1")
>> r2 = r.filtered("bitmap2")
>>
>> So that r2 would be an intersection of bitmap1 and bitmap2 (haven't
>> thought about a union nor the inverse).
>
> That does not conflict with my comments. It could be implemented as nested
> filters, or flatten the bitmap by doing an "or" operation.

Righto. Just wanted to bring it up early before things are set in stone.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Gregory Szorc
In reply to this post by Stanislau Hlebik


On Feb 10, 2017, at 10:33, Stanislau Hlebik <[hidden email]> wrote:

Last year Durham sent a proposal for bitmap storage for computehidden() https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-September/087860.html.

 

It got a few useful comments, two most important comments:

  1. Use bitmaps for lower-level data structures, for example, bitmap for public commits and bitmap for commits that are affected by obsolescence markers
  2. Add cache validation checks

 

This is a new RFC that addresses these issues.

  1. In the code, there is a cache only for precursors. Later I'm planning to add cache for public commits.
  2. Cache validation key was added. It can be any key. For precursorcache I've chosen obsstore file size and mtime and cache validation key. For phasecache we can use hash of sorted phaseroots file since this file is usually small. 

Until there is a better supported mechanism for storing "draft pushes" in an external store (Facebook's inifinitepush extension may fulfill this), Mozilla's "Try" and code review repos need to scale to >100,000 heads / phase roots. So a hash of phaseroots may not scale. (Of course, one solution to this problem would be inverting phases storage to be head based instead of root based. That doesn't solve problem if you have 100k public heads and 100k draft heads. Scaling is hard.)

 

In a few places, I decided to delete cache entirely:

  1. Caches are blown up if we receive obsmarkers via bundle2 or exchange._pullobsolete(). This is not necessary since cache won't be valid on the next run anyway because obsstore size and/or mtime was changed. We can change it.
  2. Obsstore can store markers for node that are not present in the repo. Since bitmap is basically a dict from rev to boolean it can't store markers for revisions that are not in the repo. It doesn't cause issues unless missing node is received from bundle or during pull. In this case precursorcache doesn't recognize it as obsoleted. To fix it cache is deleted during changegroup.apply() if we receive a node that is in obsstore.successors dict.
  3. Similar thing in bundlerepo - new obsoleted nodes may have been added, let's just not use precursorcache in bundlerepo.

 

 

The code is here: https://bitbucket.org/stashlebik/hg/commits/branch/hiddenbitmaps

It's not super clean yet, but all tests pass (except maybe for commit/code checks).

Please look and let me know what you think!

 

 

Thanks,

Stas

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Stanislau Hlebik
In reply to this post by Sean Farley-3
Excerpts from Sean Farley's message of 2017-02-13 18:30:25 -0800:

> Jun Wu <[hidden email]> writes:
>
> > Excerpts from Sean Farley's message of 2017-02-13 17:04:35 -0800:
> >> I was thinking about a more high-level approach (please feel free to
> >> pick apart):
> >>
> >> r = repo.filtered("bitmap1")
> >> r2 = r.filtered("bitmap2")
> >>
> >> So that r2 would be an intersection of bitmap1 and bitmap2 (haven't
> >> thought about a union nor the inverse).
> >
> > That does not conflict with my comments. It could be implemented as nested
> > filters, or flatten the bitmap by doing an "or" operation.
>
> Righto. Just wanted to bring it up early before things are set in stone.

Current `repoview.filtered()` implementation can apply only one filter. I
think it will be error-prone to change it, won't it?
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Stanislau Hlebik
In reply to this post by Gregory Szorc
Excerpts from Gregory Szorc's message of 2017-02-13 20:28:37 -0700:

>
> > On Feb 10, 2017, at 10:33, Stanislau Hlebik <[hidden email]> wrote:
> >
> > Last year Durham sent a proposal for bitmap storage for computehidden() https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_pipermail_mercurial-2Ddevel_2016-2DSeptember_087860.html&d=DwIFAg&c=5VD0RTtNlTh3ycd41b3MUw&r=1EQ58Dmb5uX1qHujcsT1Mg&m=ee96J8IDM8uhZeONo6skPSXItouNwkeRGNxHeCaXiXM&s=biY_gtRGCl25baAE_Nlw8cl4YUefBM0Xb7L0kuGklRA&e= .
> >  
> > It got a few useful comments, two most important comments:
> > Use bitmaps for lower-level data structures, for example, bitmap for public commits and bitmap for commits that are affected by obsolescence markers
> > Add cache validation checks
> >  
> > This is a new RFC that addresses these issues.
> > In the code, there is a cache only for precursors. Later I'm planning to add cache for public commits.
> > Cache validation key was added. It can be any key. For precursorcache I've chosen obsstore file size and mtime and cache validation key. For phasecache we can use hash of sorted phaseroots file since this file is usually small.
>
> Until there is a better supported mechanism for storing "draft pushes" in an external store (Facebook's inifinitepush extension may fulfill this), Mozilla's "Try" and code review repos need to scale to >100,000 heads / phase roots. So a hash of phaseroots may not scale. (Of course, one solution to this problem would be inverting phases storage to be head based instead of root based. That doesn't solve problem if you have 100k public heads and 100k draft heads. Scaling is hard.)

Then I will use the same cache key as for 'obsstore' - mtime + size.
BTW infinitepush is ready and opensourced so you can give it a try:
https://bitbucket.org/facebook/hg-experimental/src/567c1477eebf43ff5fe7412c650ec49aa3e44a3c/infinitepush/?at=default

>
> >  
> > In a few places, I decided to delete cache entirely:
> > Caches are blown up if we receive obsmarkers via bundle2 or exchange._pullobsolete(). This is not necessary since cache won't be valid on the next run anyway because obsstore size and/or mtime was changed. We can change it.
> > Obsstore can store markers for node that are not present in the repo. Since bitmap is basically a dict from rev to boolean it can't store markers for revisions that are not in the repo. It doesn't cause issues unless missing node is received from bundle or during pull. In this case precursorcache doesn't recognize it as obsoleted. To fix it cache is deleted during changegroup.apply() if we receive a node that is in obsstore.successors dict.
> > Similar thing in bundlerepo - new obsoleted nodes may have been added, let's just not use precursorcache in bundlerepo.
> >  
> >  
> > The code is here: https://urldefense.proofpoint.com/v2/url?u=https-3A__bitbucket.org_stashlebik_hg_commits_branch_hiddenbitmaps&d=DwIFAg&c=5VD0RTtNlTh3ycd41b3MUw&r=1EQ58Dmb5uX1qHujcsT1Mg&m=ee96J8IDM8uhZeONo6skPSXItouNwkeRGNxHeCaXiXM&s=dp9GiK3_WKq2zB8mVuMy2ksPF5tnGV_1T_OSAp6jD8o&e= 
> > It's not super clean yet, but all tests pass (except maybe for commit/code checks).
> > Please look and let me know what you think!
> >  
> >  
> > Thanks,
> > Stas
> > _______________________________________________
> > Mercurial-devel mailing list
> > [hidden email]
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_mailman_listinfo_mercurial-2Ddevel&d=DwIFAg&c=5VD0RTtNlTh3ycd41b3MUw&r=1EQ58Dmb5uX1qHujcsT1Mg&m=ee96J8IDM8uhZeONo6skPSXItouNwkeRGNxHeCaXiXM&s=mqvxWBluu-CxyLDxFLW49s61VhbzU_NiqHHCSu6NVxw&e= 
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Stanislau Hlebik
In reply to this post by Stanislau Hlebik
Excerpts from Stanislau Hlebik's message of 2017-02-14 09:29:25 +0000:

> Excerpts from Sean Farley's message of 2017-02-13 18:30:25 -0800:
> > Jun Wu <[hidden email]> writes:
> >
> > > Excerpts from Sean Farley's message of 2017-02-13 17:04:35 -0800:
> > >> I was thinking about a more high-level approach (please feel free to
> > >> pick apart):
> > >>
> > >> r = repo.filtered("bitmap1")
> > >> r2 = r.filtered("bitmap2")
> > >>
> > >> So that r2 would be an intersection of bitmap1 and bitmap2 (haven't
> > >> thought about a union nor the inverse).
> > >
> > > That does not conflict with my comments. It could be implemented as nested
> > > filters, or flatten the bitmap by doing an "or" operation.
> >
> > Righto. Just wanted to bring it up early before things are set in stone.
>
> Current `repoview.filtered()` implementation can apply only one filter. I
> think it will be error-prone to change it, won't it?

It seems that it's better to use sorted lists instead of bitmaps. In a
couple of places it is expected that repo.filteredrevs supports
iteration. But iteration over a bitmap is very slow. Instead we can
store list of non-public revs and list of precursor revs and load them
in a set.

It will be slow for the case where repo has lots of draft commits.
In this case it's probably better to disable this feature completely.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Jun Wu
Excerpts from Stanislau Hlebik's message of 2017-02-16 19:39:07 +0000:

> Excerpts from Stanislau Hlebik's message of 2017-02-14 09:29:25 +0000:
> > Excerpts from Sean Farley's message of 2017-02-13 18:30:25 -0800:
> > > Jun Wu <[hidden email]> writes:
> > >
> > > > Excerpts from Sean Farley's message of 2017-02-13 17:04:35 -0800:
> > > >> I was thinking about a more high-level approach (please feel free to
> > > >> pick apart):
> > > >>
> > > >> r = repo.filtered("bitmap1")
> > > >> r2 = r.filtered("bitmap2")
> > > >>
> > > >> So that r2 would be an intersection of bitmap1 and bitmap2 (haven't
> > > >> thought about a union nor the inverse).
> > > >
> > > > That does not conflict with my comments. It could be implemented as nested
> > > > filters, or flatten the bitmap by doing an "or" operation.
> > >
> > > Righto. Just wanted to bring it up early before things are set in stone.
> >
> > Current `repoview.filtered()` implementation can apply only one filter. I
> > think it will be error-prone to change it, won't it?
>
> It seems that it's better to use sorted lists instead of bitmaps. In a
> couple of places it is expected that repo.filteredrevs supports
> iteration. But iteration over a bitmap is very slow. Instead we can
> store list of non-public revs and list of precursor revs and load them
> in a set.
>
> It will be slow for the case where repo has lots of draft commits.
> In this case it's probably better to disable this feature completely.

I think it's fine to have slow iteration, as long as the iteration operation
is uncommon. The point is, the most common operation should be fast - that
is, ishidden(rev) should be fast. Stateful chg will help the read-only case,
the bitmap + mmap would ideally make write (like rebase) cases fast.

Some code can be changed - "scmutil.filteredhash" seems to be one user that
iterates "filteredrevs". But what it needs is only a hash - it could hash
something else, like the mtime, size etc.

Bitmaps could also be smarter - like maintaining the min and max revisions
so it does not need to be exactly len(repo).
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Stanislau Hlebik
Excerpts from Jun Wu's message of 2017-02-16 13:42:46 -0800:

> Excerpts from Stanislau Hlebik's message of 2017-02-16 19:39:07 +0000:
> > Excerpts from Stanislau Hlebik's message of 2017-02-14 09:29:25 +0000:
> > > Excerpts from Sean Farley's message of 2017-02-13 18:30:25 -0800:
> > > > Jun Wu <[hidden email]> writes:
> > > >
> > > > > Excerpts from Sean Farley's message of 2017-02-13 17:04:35 -0800:
> > > > >> I was thinking about a more high-level approach (please feel free to
> > > > >> pick apart):
> > > > >>
> > > > >> r = repo.filtered("bitmap1")
> > > > >> r2 = r.filtered("bitmap2")
> > > > >>
> > > > >> So that r2 would be an intersection of bitmap1 and bitmap2 (haven't
> > > > >> thought about a union nor the inverse).
> > > > >
> > > > > That does not conflict with my comments. It could be implemented as nested
> > > > > filters, or flatten the bitmap by doing an "or" operation.
> > > >
> > > > Righto. Just wanted to bring it up early before things are set in stone.
> > >
> > > Current `repoview.filtered()` implementation can apply only one filter. I
> > > think it will be error-prone to change it, won't it?
> >
> > It seems that it's better to use sorted lists instead of bitmaps. In a
> > couple of places it is expected that repo.filteredrevs supports
> > iteration. But iteration over a bitmap is very slow. Instead we can
> > store list of non-public revs and list of precursor revs and load them
> > in a set.
> >
> > It will be slow for the case where repo has lots of draft commits.
> > In this case it's probably better to disable this feature completely.
>
> I think it's fine to have slow iteration, as long as the iteration operation
> is uncommon. The point is, the most common operation should be fast - that
> is, ishidden(rev) should be fast. Stateful chg will help the read-only case,
> the bitmap + mmap would ideally make write (like rebase) cases fast
.
As I said before we will load all non-public revs in one set and all
precursor revs in another set. So `ispublic()` and `isprecursor()`
checks are very fast, it's just a set lookup. Same with update - it's
just an insertion in the set.

>
> Some code can be changed - "scmutil.filteredhash" seems to be one user that
> iterates "filteredrevs". But what it needs is only a hash - it could hash
> something else, like the mtime, size etc.

Bookmarks, changelog, obsstore and tags can affect filtered set.
For filtered repo we'll need to use size + mtime of bookmarks,
changelog, obsstore, tags and maybe even smth else. That maybe
error-prone.
>
> Bitmaps could also be smarter - like maintaining the min and max revisions
> so it does not need to be exactly len(repo).
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Stanislau Hlebik
Excerpts from Stanislau Hlebik's message of 2017-02-17 11:24:34 +0000:

> Excerpts from Jun Wu's message of 2017-02-16 13:42:46 -0800:
> > Excerpts from Stanislau Hlebik's message of 2017-02-16 19:39:07 +0000:
> > > Excerpts from Stanislau Hlebik's message of 2017-02-14 09:29:25 +0000:
> > > > Excerpts from Sean Farley's message of 2017-02-13 18:30:25 -0800:
> > > > > Jun Wu <[hidden email]> writes:
> > > > >
> > > > > > Excerpts from Sean Farley's message of 2017-02-13 17:04:35 -0800:
> > > > > >> I was thinking about a more high-level approach (please feel free to
> > > > > >> pick apart):
> > > > > >>
> > > > > >> r = repo.filtered("bitmap1")
> > > > > >> r2 = r.filtered("bitmap2")
> > > > > >>
> > > > > >> So that r2 would be an intersection of bitmap1 and bitmap2 (haven't
> > > > > >> thought about a union nor the inverse).
> > > > > >
> > > > > > That does not conflict with my comments. It could be implemented as nested
> > > > > > filters, or flatten the bitmap by doing an "or" operation.
> > > > >
> > > > > Righto. Just wanted to bring it up early before things are set in stone.
> > > >
> > > > Current `repoview.filtered()` implementation can apply only one filter. I
> > > > think it will be error-prone to change it, won't it?
> > >
> > > It seems that it's better to use sorted lists instead of bitmaps. In a
> > > couple of places it is expected that repo.filteredrevs supports
> > > iteration. But iteration over a bitmap is very slow. Instead we can
> > > store list of non-public revs and list of precursor revs and load them
> > > in a set.
> > >
> > > It will be slow for the case where repo has lots of draft commits.
> > > In this case it's probably better to disable this feature completely.
> >
> > I think it's fine to have slow iteration, as long as the iteration operation
> > is uncommon. The point is, the most common operation should be fast - that
> > is, ishidden(rev) should be fast. Stateful chg will help the read-only case,
> > the bitmap + mmap would ideally make write (like rebase) cases fast
> .
> As I said before we will load all non-public revs in one set and all
> precursor revs in another set. So `ispublic()` and `isprecursor()`
> checks are very fast, it's just a set lookup. Same with update - it's
> just an insertion in the set.
>
> >
> > Some code can be changed - "scmutil.filteredhash" seems to be one user that
> > iterates "filteredrevs". But what it needs is only a hash - it could hash
> > something else, like the mtime, size etc.
>
> Bookmarks, changelog, obsstore and tags can affect filtered set.
> For filtered repo we'll need to use size + mtime of bookmarks,
> changelog, obsstore, tags and maybe even smth else. That maybe
> error-prone.

This is implementation of two caches (nonpublic + precursor) using
serialized sorted lists and sets
https://bitbucket.org/stashlebik/hg/commits/99879579ac2848a2567810b677d8344150a7b319?at=hiddenbitmaps_lists

> >
> > Bitmaps could also be smarter - like maintaining the min and max revisions
> > so it does not need to be exactly len(repo).
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Jun Wu
In reply to this post by Stanislau Hlebik
Excerpts from Stanislau Hlebik's message of 2017-02-17 11:24:34 +0000:
> As I said before we will load all non-public revs in one set and all

The problem is, loading a Python set from disk is O(size-of-the-set).

Bitmap's loading cost should be basically 0 (with mmap). I think that's why
we want bitmap at the first place. There are other choices like packfile
index, hash tables, but bitmap is the simplest and most efficient.

> precursor revs in another set. So `ispublic()` and `isprecursor()`
> checks are very fast, it's just a set lookup. Same with update - it's
> just an insertion in the set.

If cache invalidation happens for common write operations and the cache
needs to be rebuilt from scratch, then the solution is not better than
stateful chg - the chg way is easier (no need to serialize data), and safer
(not vulnerable to cache file corruption).

To make the bitmap cache really useful (solving problems that chg cannot not
solve), we need to incrementally update the cache, and avoid rebuilding it
from scratch, at least for common commands like "rebase", "amend", "commit".

> > Some code can be changed - "scmutil.filteredhash" seems to be one user
> > that iterates "filteredrevs". But what it needs is only a hash - it
> > could hash something else, like the mtime, size etc.
>
> Bookmarks, changelog, obsstore and tags can affect filtered set.
> For filtered repo we'll need to use size + mtime of bookmarks,
> changelog, obsstore, tags and maybe even smth else. That maybe
> error-prone.

The "filteredhash" seems to be only used by tags and branchmap cache. I
think we can actually get rid of it by only caching unfiltered versions and
do the filtering later. It also seems wrong to have tags and filteredrevs
mutually depend on each other.

> > Bitmaps could also be smarter - like maintaining the min and max revisions
> > so it does not need to be exactly len(repo).
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Stanislau Hlebik
Excerpts from Jun Wu's message of 2017-02-17 10:30:45 -0800:
> Excerpts from Stanislau Hlebik's message of 2017-02-17 11:24:34 +0000:
> > As I said before we will load all non-public revs in one set and all
>
> The problem is, loading a Python set from disk is O(size-of-the-set).
>
> Bitmap's loading cost should be basically 0 (with mmap). I think that's why
> we want bitmap at the first place. There are other choices like packfile
> index, hash tables, but bitmap is the simplest and most efficient.

Agree, but Python set is going to be small and loading it is also fast.

>
> > precursor revs in another set. So `ispublic()` and `isprecursor()`
> > checks are very fast, it's just a set lookup. Same with update - it's
> > just an insertion in the set.
>
> If cache invalidation happens for common write operations and the cache
> needs to be rebuilt from scratch, then the solution is not better than
> stateful chg - the chg way is easier (no need to serialize data), and safer
> (not vulnerable to cache file corruption).
>
> To make the bitmap cache really useful (solving problems that chg cannot not
> solve), we need to incrementally update the cache, and avoid rebuilding it
> from scratch, at least for common commands like "rebase", "amend", "commit".
>
> > > Some code can be changed - "scmutil.filteredhash" seems to be one user
> > > that iterates "filteredrevs". But what it needs is only a hash - it
> > > could hash something else, like the mtime, size etc.
> >
> > Bookmarks, changelog, obsstore and tags can affect filtered set.
> > For filtered repo we'll need to use size + mtime of bookmarks,
> > changelog, obsstore, tags and maybe even smth else. That maybe
> > error-prone.
>
> The "filteredhash" seems to be only used by tags and branchmap cache. I
> think we can actually get rid of it by only caching unfiltered versions and
> do the filtering later. It also seems wrong to have tags and filteredrevs
> mutually depend on each other.

If we can get rid of `filteredhash` then we can probably proceed with
bitmaps. Although using bitmaps will still require bigger code changes
than using serialized sets.

>
> > > Bitmaps could also be smarter - like maintaining the min and max revisions
> > > so it does not need to be exactly len(repo).
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Bryan O'Sullivan
In reply to this post by Jun Wu
On Fri, Feb 17, 2017 at 10:30 AM, Jun Wu <[hidden email]> wrote:
Excerpts from Stanislau Hlebik's message of 2017-02-17 11:24:34 +0000:
> As I said before we will load all non-public revs in one set and all

The problem is, loading a Python set from disk is O(size-of-the-set).

Bitmap's loading cost should be basically 0 (with mmap). I think that's why
we want bitmap at the first place. There are other choices like packfile
index, hash tables, but bitmap is the simplest and most efficient.

Hey folks,

I haven't yet seen mention of some considerations that seem very important in driving the decision-making, so I'd appreciate it if someone could fill me in.

Firstly, what's our current understanding of the sizes and compositions of these sets of numbers? In theory, we have a lot of data from practical application at Facebook, but nobody's brought it up.

To clarify why this matters, if an obsstore contains say 100 entries, then a very naive implementation should be fine. If it's related to something else, such as the size of, amount of activity in, or local age of a repo, then maybe we have a different set of decisions to make.

Secondly, what operations are being performed on these sets today, and do we know if those operations are going to change?

For instance, if a may be extended via new elements added to its end, then an mmap-based implementation immediately becomes awkward. If a set is traversed or we perform operations like union or intersection, then a bitmap might make sense if the set is very small and dense, but not if it's big and sparse.

(Jun's idea that mmap is effectively free is also not borne out in practice, but that's a second-order issue here.)

So tell us what the needed API is, what the characteristics of the data are, and that'll help understand how to make a good decision.

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Stanislau Hlebik
Excerpts from Bryan O'Sullivan's message of 2017-02-17 13:29:58 -0800:

> On Fri, Feb 17, 2017 at 10:30 AM, Jun Wu <[hidden email]> wrote:
>
> > Excerpts from Stanislau Hlebik's message of 2017-02-17 11:24:34 +0000:
> > > As I said before we will load all non-public revs in one set and all
> >
> > The problem is, loading a Python set from disk is O(size-of-the-set).
> >
> > Bitmap's loading cost should be basically 0 (with mmap). I think that's why
> > we want bitmap at the first place. There are other choices like packfile
> > index, hash tables, but bitmap is the simplest and most efficient.
> >
>
> Hey folks,
>
> I haven't yet seen mention of some considerations that seem very important
> in driving the decision-making, so I'd appreciate it if someone could fill
> me in.
>
> Firstly, what's our current understanding of the sizes and compositions of
> these sets of numbers? In theory, we have a lot of data from practical
> application at Facebook, but nobody's brought it up.

I assume that both sets (set for nonpublic commits and set for
obsstore) are going to be very small comparing to the repo size. I
expect both sets < 1% of the repo size. And the sets is going to be
sparse.

>
> To clarify why this matters, if an obsstore contains say 100 entries, then
> a very naive implementation should be fine. If it's related to something
> else, such as the size of, amount of activity in, or local age of a repo,
> then maybe we have a different set of decisions to make.
>
> Secondly, what operations are being performed on these sets today, and do
> we know if those operations are going to change?
>
> For instance, if a may be extended via new elements added to its end, then
> an mmap-based implementation immediately becomes awkward. If a set is
> traversed or we perform operations like union or intersection, then a
> bitmap might make sense if the set is very small and dense, but not if it's
> big and sparse.

We need three operations: iteration over the whole
set, testing if an element belongs to the set and adding new elements.
New elements can be added to the end of the set.
Not sure if more operations will be added, probably not.

Agree, we can't use bitmaps for iteration.
But potentially we can clean all of the places where iteration is used.
Most of them are pretty straight-forward except for one tricky case in
scmutil.filteredhash() function.

>
> (Jun's idea that mmap is effectively free is also not borne out in
> practice, but that's a second-order issue here.)
>
> So tell us what the needed API is, what the characteristics of the data
> are, and that'll help understand how to make a good decision.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Jun Wu
In reply to this post by Bryan O'Sullivan
Excerpts from Bryan O'Sullivan's message of 2017-02-17 13:29:58 -0800:

> On Fri, Feb 17, 2017 at 10:30 AM, Jun Wu <[hidden email]> wrote:
>
> > Excerpts from Stanislau Hlebik's message of 2017-02-17 11:24:34 +0000:
> > > As I said before we will load all non-public revs in one set and all
> >
> > The problem is, loading a Python set from disk is O(size-of-the-set).
> >
> > Bitmap's loading cost should be basically 0 (with mmap). I think that's why
> > we want bitmap at the first place. There are other choices like packfile
> > index, hash tables, but bitmap is the simplest and most efficient.
> >
>
> Hey folks,
>
> I haven't yet seen mention of some considerations that seem very important
> in driving the decision-making, so I'd appreciate it if someone could fill
> me in.
>
> Firstly, what's our current understanding of the sizes and compositions of
> these sets of numbers? In theory, we have a lot of data from practical
> application at Facebook, but nobody's brought it up.
>
> To clarify why this matters, if an obsstore contains say 100 entries, then
> a very naive implementation should be fine. If it's related to something
> else, such as the size of, amount of activity in, or local age of a repo,
> then maybe we have a different set of decisions to make.

First, I agree that if N is small, O(N) or O(log(N)), or O(1), do not matter
much in practice. I tend to always reason about future-proof solutions,
which may or may not be a good habit - I'll adjust in the future.

I think there are multiple topics being discussed:

  1. How to solve the overhead loading hiddenrevs with minimal changes?
  2. Why is the bitmap format interesting (future use-cases)?

For 1,

  I once tried to make one bit of revlog flag as the source of truth of
  "hidden", which looks a nice solution. But it got rejected because revlog
  has to be append-only. Moving the flag bits out formed the bitmap idea
  naturally.

  Bitmap fits here but it does not mean we have to use it to solve this
  particular problem. For example, stateful chg will get us some easy wins,
  and see the problem analysis and proposed solution below.

  Note that we actually have a simple caching format for the "hiddenrevs".
  See repoview.py, computehidden.

  But what's wrong with "computehidden" is the choice of the cache key - it
  calls "hideablerevs", which takes 0.5 seconds in obsolete.getrevs(repo,
  'obsolete'), which reads the giant obsstore (in hg-committed) and do some
  complex computation. The result is then hashed and used as a cache key of
  "computehidden".

  So, if we do not want to know the "obsolete()" revisions, but just want to
  know what's hidden. Changing the cache key would be a very straightforward
  and effective solution which does not require a new file format.

  That'll help commands like "hg st", "hg log -T foo", etc. But it won't
  help complex "log"s which do require the "obsolete()" revisions. To speed
  up the latter, we can probably just copy the caching infrastructure from
  repoview.py.

For 2,

  len(filteredrevs) in my hg-committed is 2155. It's small. I tend to think
  about solutions that scale longer and are not too complex to build. That
  may or may not be a good habit.
 
  That said, I agree if the set is small and perf, then whatever easier to
  implement makes sense.
 
  I have some crazy ideas about the future where a giant bitmap is
  practically useful. For example, source controlling "visible heads" and
  the obsstore, then adding some C code to quickly fill a bitmap (requires
  "set" and "setancestors/descendants"), then we can show the user whenever
  their repo used to look like back in any time by using the bitmap filter.
  That'll help investigate hard issues, and implement "hg undo".

> Secondly, what operations are being performed on these sets today, and do
> we know if those operations are going to change?
>
> For instance, if a may be extended via new elements added to its end, then
> an mmap-based implementation immediately becomes awkward. If a set is

Not that bad if we preallocate space.

> traversed or we perform operations like union or intersection, then a
> bitmap might make sense if the set is very small and dense, but not if it's
> big and sparse.

Not necessarily true if we use an advanced variant, like the "roaringbitmap"
mentioned by you last year [1].

[1]: https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-September/087864.html

> (Jun's idea that mmap is effectively free is also not borne out in
> practice, but that's a second-order issue here.)
>
> So tell us what the needed API is, what the characteristics of the data
> are, and that'll help understand how to make a good decision.

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Jun Wu
In reply to this post by Stanislau Hlebik
Excerpts from Stanislau Hlebik's message of 2017-02-17 16:06:33 +0000:
> This is implementation of two caches (nonpublic + precursor) using
> serialized sorted lists and sets
> https://bitbucket.org/stashlebik/hg/commits/99879579ac2848a2567810b677d8344150a7b319?at=hiddenbitmaps_lists

I had a quick look. Here are my suggestions:

1. Prefer caching at a lower-level, so they get more widely used.

   Practically, do not change localrepo.py, change phasecache.py and
   obsolete.py instead.

   I happened to do some refactoring in phasecache [1]. I'll send a V2 soon,
   since the change seems related.

   Note: phasecache is some C code so you need some data to prove the change
   is worthy.

2. The "Python set" serialization is already in repoview.py. Reuse it.

   Maybe move "_writehiddencache" to scmutil, or a new simple class.
   "tryreadcache" could be changed and then moved.

   Then the functions could be reused in obsolete.getrevs.

[1]: https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-February/092817.html
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: bitmap storage for precursors and phases

Stanislau Hlebik
In reply to this post by Jun Wu
Excerpts from Jun Wu's message of 2017-02-17 19:14:12 -0800:

> Excerpts from Bryan O'Sullivan's message of 2017-02-17 13:29:58 -0800:
> > On Fri, Feb 17, 2017 at 10:30 AM, Jun Wu <[hidden email]> wrote:
> >
> > > Excerpts from Stanislau Hlebik's message of 2017-02-17 11:24:34 +0000:
> > > > As I said before we will load all non-public revs in one set and all
> > >
> > > The problem is, loading a Python set from disk is O(size-of-the-set).
> > >
> > > Bitmap's loading cost should be basically 0 (with mmap). I think that's why
> > > we want bitmap at the first place. There are other choices like packfile
> > > index, hash tables, but bitmap is the simplest and most efficient.
> > >
> >
> > Hey folks,
> >
> > I haven't yet seen mention of some considerations that seem very important
> > in driving the decision-making, so I'd appreciate it if someone could fill
> > me in.
> >
> > Firstly, what's our current understanding of the sizes and compositions of
> > these sets of numbers? In theory, we have a lot of data from practical
> > application at Facebook, but nobody's brought it up.
> >
> > To clarify why this matters, if an obsstore contains say 100 entries, then
> > a very naive implementation should be fine. If it's related to something
> > else, such as the size of, amount of activity in, or local age of a repo,
> > then maybe we have a different set of decisions to make.
>
> First, I agree that if N is small, O(N) or O(log(N)), or O(1), do not matter
> much in practice. I tend to always reason about future-proof solutions,
> which may or may not be a good habit - I'll adjust in the future.
>
> I think there are multiple topics being discussed:
>
>   1. How to solve the overhead loading hiddenrevs with minimal changes?
>   2. Why is the bitmap format interesting (future use-cases)?
>
> For 1,
>
>   I once tried to make one bit of revlog flag as the source of truth of
>   "hidden", which looks a nice solution. But it got rejected because revlog
>   has to be append-only. Moving the flag bits out formed the bitmap idea
>   naturally.
>
>   Bitmap fits here but it does not mean we have to use it to solve this
>   particular problem. For example, stateful chg will get us some easy wins,
>   and see the problem analysis and proposed solution below.
>
>   Note that we actually have a simple caching format for the "hiddenrevs".
>   See repoview.py, computehidden.
>
>   But what's wrong with "computehidden" is the choice of the cache key - it
>   calls "hideablerevs", which takes 0.5 seconds in obsolete.getrevs(repo,
>   'obsolete'), which reads the giant obsstore (in hg-committed) and do some
>   complex computation. The result is then hashed and used as a cache key of
>   "computehidden".
>
>   So, if we do not want to know the "obsolete()" revisions, but just want to
>   know what's hidden. Changing the cache key would be a very straightforward
>   and effective solution which does not require a new file format.

Hm, changing the cache key to smth like phaseroots + obsstore +
changelog mtime + size  may be an easy and good win. That's a good idea.
Although it will still get invalidated after each write.

>
>   That'll help commands like "hg st", "hg log -T foo", etc. But it won't
>   help complex "log"s which do require the "obsolete()" revisions. To speed
>   up the latter, we can probably just copy the caching infrastructure from
>   repoview.py.
>
> For 2,
>
>   len(filteredrevs) in my hg-committed is 2155. It's small. I tend to think
>   about solutions that scale longer and are not too complex to build. That
>   may or may not be a good habit.
>  
>   That said, I agree if the set is small and perf, then whatever easier to
>   implement makes sense.
>  
>   I have some crazy ideas about the future where a giant bitmap is
>   practically useful. For example, source controlling "visible heads" and
>   the obsstore, then adding some C code to quickly fill a bitmap (requires
>   "set" and "setancestors/descendants"), then we can show the user whenever
>   their repo used to look like back in any time by using the bitmap filter.
>   That'll help investigate hard issues, and implement "hg undo".
>
> > Secondly, what operations are being performed on these sets today, and do
> > we know if those operations are going to change?
> >
> > For instance, if a may be extended via new elements added to its end, then
> > an mmap-based implementation immediately becomes awkward. If a set is
>
> Not that bad if we preallocate space.
>
> > traversed or we perform operations like union or intersection, then a
> > bitmap might make sense if the set is very small and dense, but not if it's
> > big and sparse.
>
> Not necessarily true if we use an advanced variant, like the "roaringbitmap"
> mentioned by you last year [1].
>
> [1]: https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-September/087864.html
>
> > (Jun's idea that mmap is effectively free is also not borne out in
> > practice, but that's a second-order issue here.)
> >
> > So tell us what the needed API is, what the characteristics of the data
> > are, and that'll help understand how to make a good decision.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
12
Loading...