caching pull - stable partitioning of bundle requests

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

caching pull - stable partitioning of bundle requests

Boris Feld
Hi everyone,

Pulling from a server involves expensive server-side computation that we wish to cache. However, since the client can pull any arbitrary set of revision, grouping and dispatching the data to be cached is a hard problem.

When we implemented the new discovery for obsolescence markers, we developed a "stablerange" method to build an efficient way to slice the changesets graph into ranges. In addition to solving the obsolescence markers discovery problem, this "stablerange" principle seemed to be useful for more usages, in particular, the caching of pulls.

Right now, with the current pull bundle implementation, here is how it work: you manually create and manually declare bundles containing either all changesets (that could also be used for clone bundles) or more specific ones. When the client request some changesets, the server searches a bundle containing the needed range and send it. This often involves more than the requested data. The client needs to filter out the extraneous data. Then the client does a discovery to catch any missing changesets from the bundle. If the server doesn't find a valid pull bundle, a normal discovery is done. The manual bundle managements is suboptimal, the search for appropriate bundles has a bad complexity and the extra roundtrip and discovery adds extra slowness.

This weekend, we build a "simple" prototype that use "stablerange" to slice changegroup request in "getbundle" into multiple bundles that can be reused from one pull to another. That slicing happens as part of a normal pull, during the getbundle call and after the normal discovery happens. There are no needs for an extra discovery and getbundle call after it.

With this "stablerange" based strategy, we start from the set of requested changesets to generate a set of "standard" range covering all of them. This slicing has a good algorithmic complexity that depends on the size of the selected "missing" set of changesets. So the associated cost of will scale well with the size of the associated pull. In addition, we no longer have to do an expensive search into a list existing bundles. This helps to scale small pulls and increase the number of bundles we can cache, as the time we spend selecting bundle no longer depends on the numbers of cached ones. Since we can exactly cover the client request, we also no longer need to issue an extra pull roundtrip after the cache retrieval.

That slicing focus on producing ranges that:
  • Have a high chance to be reusable in a pull selecting similar changesets,
  • Gather most of the changesets in large bundles.

This caching strategy inherits the nice "stablerange" properties regarding repository growth
  • When a few changesets are appended to a repository, only a few ranges have to be added.
  • The overall number of ranges (and associated bundles) to create to represent all possible ranges has an O(N log(N)) complexity.

For example, here are the 15 ranges selected for a full clone of mozilla-central:

  • 262114 changesets
  • 30 changesets
  • 65536 changesets
  • 32741 changesets
  • 20 changesets
  • 7 changesets
  • 8192 changesets
  • 243 changesets
  • 13 changesets
  • 114 changesets
  • 14 changesets
  • 32 changesets
  • 16 changesets
  • 8 changesets
  • 1 changesets

If we only clone a subset of the repository, the larger ranges get reused (hg clone --rev -5000):
  • 262114 changesets found in caches
  • 30 changesets found in caches
  • 65536 changesets found in caches
  • 32741 changesets found in caches
  • 20 changesets found in caches
  • 7 changesets found in caches
  • 2048 changesets found
  • 1024 changesets found
  • 482 changesets found
  • 30 changesets found
  • 32 changesets found
  • 1 changesets found
  • 7 changesets found
  • 4 changesets found
  • 2 changesets found
  • 1 changesets found
As you can see, the larger ranges of this second pull are common with the previous pull, allowing to reuse cached bundles.

The prototype is available in a small "pullbundle" extension. It focuses on the slicing itself and we did not implement anything fancy for the cache storage and delivery. We simply store generated bundle on disk and we read it from disk when it is needed again. Others, like Joerg Sonnenberger or Gregory Szorc, are already working on the "cache delivery" problem.

We are getting good result our of that prototypes when testing it on clones of mozilla-central and netbsd-src. See "Example Result" section for detail.

The prototype is up and running on our hgweb "mirror" instance https://mirror.octobus.net/.

The extension comes with a small debug command that produces statistic of the ranges that multiple random pulls would use.

The "stablerange" implementation currently still[1] live in the evolve extensions, so we put the extensions in the same repository for simplicity as "pullbundle". This is not ideal but was a simple solution in the time we could dedicate. This extension is not part of any official release yet. To test it you have to install it from the repository for now: https://www.mercurial-scm.org/repo/evolve/#default


The prototype performance is not stellar, but good enough to give useful result in a reasonable amount of time. A production-grade implementation of stablerange algorithm and storage will fix that. There is also room for improvement progression in the algorithm themselves, multiple sub-problem can be improved. We started having regular meeting with University researcher working on graph theory, they are interested in the problem space.

[1] The stablerange principle has been validating in the field and is ready to get upstreamed.

Example results.


The extensions come with a command to simulate multiple pulls of a random set of revisions (from a larger set of revision we define). This starts with a cold cache for simplicity.

Mozilla Central


We gathering 100 sample pulls within 20443 revisions
  • Median pull size: 18 338
  • Median number of changegroup used: 132 changegroups.
  • Median changeset not cached: 88
  • Median ratio of changeset already in the cache: 99.5%

The number of different ranges stays under control as expected:
  • Total changeset served: 1 817 955
  • Total changeset cache hit ratio: 96% (from a cold cache)
  • Distinct range cached: 12 983, Most of them very small (90% ≤ 32 changesets)
  • A small number of (larger) ranges get most of the cache hit.

Providing the smaller range from cache might not be a good tradeoff. If we skip using the cache for smaller ranges we still get interesting results:

Only caching range containing 256 changeset or more:

  • Median pull size: 18 940
  • Median number of changegroup used: 12
  • Median changeset not cached: 1 949
  • Median ratio of changeset already in the cache: 90%
  • Total changeset served: 1 850 243
  • Total changeset cache hit ratio: 87%
  • Distinct range cached: 1 150

Another way to reduce the number of server bundle would be to do some "over serving": using bundle containing some common changesets.

(See the end of the email for full details)

netbsd


This time, we issues more random pull (1000) within a set a bit smaller set of 10 000 changesets.

This resulted in smaller pulls, that also show good results:

  • Median pull size: 1673
  • Median number of changegroup used: 51
  • Median changeset not cached: 50
  • Median ratio of changeset already in the cache: 99%
  • Total changeset served: 1601087
  • Total changeset cache hit ratio: 96%
  • Distinct range cached: 51751

Trying the same 256+ changesets limit on caching, we see a stronger impact. Probably because of the smaller pulls:

  • Median pull size: 1592
  • Median number of changegroup used: 2
  • Median changeset not cached: 663
  • Median ratio of changeset already in the cache: 46%
  • Total changeset served: 1 554 227
  • Total changeset cache hit ratio: 56%
  • Distinct range cached: 1 914

(See the end of the email for full details)

pypy


pypy testing was done using 1 000 pulls within 16687 changesets.


  • Median pull size: 11375
  • Median number of changegroup used: 1206
  • Median changeset not cached: 12
  • Median ratio of changeset already in the cache: 100%
  • Total changeset served: 11 167 863
  • Total changeset cache hit ratio: 99%
  • Distinct range cached: 1 139 537

Installing the 256+ changeset limits give less good results. This is probably the result of a shallower pull space and the amount of merge in the pypy repository. The pypy repository is significantly more branchy than the other ones, there is some known way we could improve stablerange partitioning in this cases (to produce larger ranges).

  • Median pull size: 11457
  • Median number of changegroup used: 9
  • Median changeset not cached: 7276
  • Median ratio of changeset already in the cache: 37%
  • Total changeset served: 11211093
  • Total changeset cache hit ratio: 36%
  • Distinct range cached: 8964

Full details:


mozilla central, 100 pull in 20553 revisions, no limit

    mozilla-central> hg debugpullbundlecacheoverlap --count 100 -- 'tip~10000:' 
    gathering 100 sample pulls within 20443 revisions
    pull size:
      min: 13176
      10%: 16425
      25%: 17344
      50%: 18338
      75%: 19192
      90%: 19719
      95%: 19902
      max: 20272
    non-cached changesets:
      min: 4
      10%: 10
      25%: 24
      50%: 88
      75%: 237
      90%: 956
      95%: 3152
      max: 17440
    ratio of cached changesets:
      min: 0.0
      10%: 0.947941624918
      25%: 0.987343800064
      50%: 0.995036297078
      75%: 0.998774313882
      90%: 0.999421798208
      95%: 0.999634750848
      max: 0.999795354548
    bundle count:
      min: 74
      10%: 99
      25%: 113
      50%: 132
      75%: 146
      90%: 158
      95%: 169
      max: 186
    ratio of cached bundles:
      min: 0.0
      10%: 0.685082872928
      25%: 0.810810810811
      50%: 0.911392405063
      75%: 0.953020134228
      90%: 0.974683544304
      95%: 0.98125
      max: 0.993377483444
    changesets served:
      total:      1817955
      from cache: 1752642 (96%)
      bundle:       12983
    size of cached bundles:
      min: 1
      10%: 1
      25%: 2
      50%: 4
      75%: 9
      90%: 32
      95%: 64
      max: 8165
    hit on cached bundles:
      min: 1
      10%: 1
      25%: 1
      50%: 2
      75%: 4
      90%: 14
      95%: 20
      max: 100

mozilla central, 100 pull in 20443 revision, only caching ranges of 256 changeset and above

    mozilla-central > hg debugpullbundlecacheoverlap --count 100 --min-cache=256 -- 'tip~10000:'

    gathering 100 sample pulls within 20443 revisions

      not caching ranges smaller than 256 changesets

    pull size:

      min: 14060

      10%: 16910

      25%: 17923

      50%: 18940

      75%: 19471

      90%: 19884

      95%: 20029

      max: 20309

    non-cached changesets:

      min: 973

      10%: 1398

      25%: 1707

      50%: 1949

      75%: 2246

      90%: 2590

      95%: 3448

      max: 19551

    ratio of cached changesets:

      min: 0.0

      10%: 0.839908649729

      25%: 0.884512085944

      50%: 0.897293365889

      75%: 0.91018907563

      90%: 0.926944971537

      95%: 0.935139218649

      max: 0.946839315959

    bundle count:

      min: 4

      10%: 10

      25%: 11

      50%: 12

      75%: 12

      90%: 13

      95%: 15

      max: 16

    ratio of cached bundles:

      min: 0.0

      10%: 0.909090909091

      25%: 1.0

      50%: 1.0

      75%: 1.0

      90%: 1.0

      95%: 1.0

      max: 1.0

    changesets served:

      total:      1850243

      from cache: 1617379 (87%)

      bundle:        1150

    size of cached bundles:

      min: 256

      10%: 256

      25%: 256

      50%: 512

      75%: 1024

      90%: 1024

      95%: 1024

      max: 8165

    hit on cached bundles:

      min: 1

      10%: 1

      25%: 1

      50%: 7

      75%: 44

      90%: 44

      95%: 59

      max: 98

netbsd-src 1000 pull within 10000 revisions:

    netbsd-src > hg debugpullbundlecacheoverlap --count 1000 -- '-10000:'

    gathering 1000 sample pulls within 10000 revisions

    pull size:

      min: 10

      10%: 339

      25%: 865

      50%: 1673

      75%: 2330

      90%: 2752

      95%: 2893

      max: 3466

    non-cached changesets:

      min: 0

      10%: 3

      25%: 7

      50%: 16

      75%: 50

      90%: 137

      95%: 239

      max: 2787

    ratio of cached changesets:

      min: 0.0

      10%: 0.781553398058

      25%: 0.940663176265

      50%: 0.987631184408

      75%: 0.996178830722

      90%: 0.99843688941

      95%: 0.998939554613

      max: 1.0

    bundle count:

      min: 10

      10%: 28

      25%: 37

      50%: 51

      75%: 65

      90%: 78

      95%: 88

      max: 121

    ratio of cached bundles:

      min: 0.0

      10%: 0.446808510638

      25%: 0.673076923077

      50%: 0.826086956522

      75%: 0.901639344262

      90%: 0.942857142857

      95%: 0.96

      max: 1.0

    changesets served:

      total:      1601087

      from cache: 1539491 (96%)

      bundle:       51751

    size of cached bundles:

      min: 1

      10%: 1

      25%: 1

      50%: 1

      75%: 4

      90%: 8

      95%: 16

      max: 2048

    hit on cached bundles:

      min: 1

      10%: 1

      25%: 1

      50%: 2

      75%: 3

      90%: 8

      95%: 13

      max: 291

netbsd-src 1000 pull within 10000 revisions, not caching range smaller than 256:

    netbsd-src > hg debugpullbundlecacheoverlap --count 1000 --min-cache=256 -- '-10000:'

    gathering 1000 sample pulls within 10000 revisions

      not caching ranges smaller than 256 changesets

    pull size:

      min: 10

      10%: 329

      25%: 813

      50%: 1592

      75%: 2271

      90%: 2745

      95%: 2922

      max: 3719

    non-cached changesets:

      min: 10

      10%: 265

      25%: 440

      50%: 663

      75%: 911

      90%: 1111

      95%: 1229

      max: 2852

    ratio of cached changesets:

      min: 0.0

      10%: 0.0

      25%: 0.136752136752

      50%: 0.461261261261

      75%: 0.686327077748

      90%: 0.792263056093

      95%: 0.829552819183

      max: 0.959700093721

    bundle count:

      min: 0

      10%: 0

      25%: 1

      50%: 2

      75%: 3

      90%: 4

      95%: 4

      max: 6

    ratio of cached bundles:

      min: 0.0

      10%: 1.0

      25%: 1.0

      50%: 1.0

      75%: 1.0

      90%: 1.0

      95%: 1.0

      max: 1.0

    changesets served:

      total:      1554227

      from cache:  871680 (56%)

      bundle:        1914

    size of cached bundles:

      min: 256

      10%: 256

      25%: 256

      50%: 256

      75%: 512

      90%: 512

      95%: 512

      max: 2048

    hit on cached bundles:

      min: 2

      10%: 3

      25%: 7

      50%: 61

      75%: 113

      90%: 113

      95%: 117

      max: 267

pypy 1000 pulls within 16687 changesets:

    pypy > time hg debugpullbundlecacheoverlap --count 1000 -- 'tip~2000:'

    gathering 1000 sample pulls within 16687 revisions

    pull size:

      min: 5835

      10%: 9165

      25%: 10323

      50%: 11375

      75%: 12248

      90%: 12904

      95%: 13181

      max: 14221

    non-cached changesets:

      min: 0

      10%: 1

      25%: 4

      50%: 12

      75%: 39

      90%: 142

      95%: 453

      max: 12046

    ratio of cached changesets:

      min: 0.0

      10%: 0.986539780521

      25%: 0.99640167364

      50%: 0.99889963059

      75%: 0.99964598637

      90%: 0.999911496593

      95%: 1.0

      max: 1.0

    bundle count:

      min: 183

      10%: 762

      25%: 1045

      50%: 1206

      75%: 1308

      90%: 1387

      95%: 1427

      max: 1631

    ratio of cached bundles:

      min: 0.0

      10%: 0.972310126582

      25%: 0.990171990172

      50%: 0.995529061103

      75%: 0.997882851094

      90%: 0.999203821656

      95%: 1.0

      max: 1.0

    changesets served:

      total:      11167863

      from cache: 11060195 (99%)

      bundle:     1139537

    size of cached bundles:

      min: 1

      10%: 1

      25%: 1

      50%: 2

      75%: 4

      90%: 15

      95%: 30

      max: 2041

    hit on cached bundles:

      min: 1

      10%: 1

      25%: 1

      50%: 3

      75%: 20

      90%: 245

      95%: 848

      max: 999

pypy 1000 pulls within 16687 changesets:, caching above 256 changesets only:


    time hg debugpullbundlecacheoverlap --count 1000  --min-cache=256 -- 'tip~2000:'

    gathering 1000 sample pulls within 16687 revisions

      not caching ranges smaller than 256 changesets

    pull size:

      min: 3629

      10%: 9075

      25%: 10278

      50%: 11457

      75%: 12325

      90%: 12961

      95%: 13245

      max: 14330

    non-cached changesets:

      min: 2605

      10%: 5885

      25%: 6619

      50%: 7276

      75%: 7813

      90%: 8319

      95%: 8577

      max: 11815

    ratio of cached changesets:

      min: 0.0

      10%: 0.296190172981

      25%: 0.344310129221

      50%: 0.368110984417

      75%: 0.391840607211

      90%: 0.417344173442

      95%: 0.445362934971

      max: 0.544580009385

    bundle count:

      min: 1

      10%: 7

      25%: 8

      50%: 9

      75%: 10

      90%: 11

      95%: 11

      max: 12

    ratio of cached bundles:

      min: 0.0

      10%: 1.0

      25%: 1.0

      50%: 1.0

      75%: 1.0

      90%: 1.0

      95%: 1.0

      max: 1.0

    changesets served:

      total:      11211093

      from cache: 4066703 (36%)

      bundle:        8964

    size of cached bundles:

      min: 256

      10%: 256

      25%: 256

      50%: 468

      75%: 510

      90%: 1019

      95%: 512

      max: 1916

    hit on cached bundles:

      min: 1

      10%: 1

      25%: 3

      50%: 13

      75%: 63

      90%: 823

      95%: 153

      max: 958

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: caching pull - stable partitioning of bundle requests

Joerg Sonnenberger
On Wed, Sep 26, 2018 at 08:13:13PM +0200, Boris FELD wrote:
> Then the client does a discovery to catch any missing changesets from
> the bundle. If the server doesn't find a valid pull bundle, a normal
> discovery is done. The manual bundle managements is suboptimal, the
> search for appropriate bundles has a bad complexity and the extra
> roundtrip and discovery adds extra slowness.

I don't think this classification is correct. The discovery phase is run
once, to find the common revisions and the missing heads. The server
decides based on that if it wants to send a prebuilt bundle. The client
updates the sets *without* running discovery again and just asks the
server again. As such, the overhead is one additional roundtrip per
bundle. The searching for appropiate bundles is currently unnecessary
slow, because the necessary queries are unnecessary slow.

All that said, I will look at the stableranges and more important, how
they deal with various situations.

Joerg
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: caching pull - stable partitioning of bundle requests

Gregory Szorc
In reply to this post by Boris Feld
On Wed, Sep 26, 2018 at 11:13 AM Boris FELD <[hidden email]> wrote:
Hi everyone,

Pulling from a server involves expensive server-side computation that we wish to cache. However, since the client can pull any arbitrary set of revision, grouping and dispatching the data to be cached is a hard problem.

When we implemented the new discovery for obsolescence markers, we developed a "stablerange" method to build an efficient way to slice the changesets graph into ranges. In addition to solving the obsolescence markers discovery problem, this "stablerange" principle seemed to be useful for more usages, in particular, the caching of pulls.

Right now, with the current pull bundle implementation, here is how it work: you manually create and manually declare bundles containing either all changesets (that could also be used for clone bundles) or more specific ones. When the client request some changesets, the server searches a bundle containing the needed range and send it. This often involves more than the requested data. The client needs to filter out the extraneous data. Then the client does a discovery to catch any missing changesets from the bundle. If the server doesn't find a valid pull bundle, a normal discovery is done. The manual bundle managements is suboptimal, the search for appropriate bundles has a bad complexity and the extra roundtrip and discovery adds extra slowness.

This weekend, we build a "simple" prototype that use "stablerange" to slice changegroup request in "getbundle" into multiple bundles that can be reused from one pull to another. That slicing happens as part of a normal pull, during the getbundle call and after the normal discovery happens. There are no needs for an extra discovery and getbundle call after it.

With this "stablerange" based strategy, we start from the set of requested changesets to generate a set of "standard" range covering all of them. This slicing has a good algorithmic complexity that depends on the size of the selected "missing" set of changesets. So the associated cost of will scale well with the size of the associated pull. In addition, we no longer have to do an expensive search into a list existing bundles. This helps to scale small pulls and increase the number of bundles we can cache, as the time we spend selecting bundle no longer depends on the numbers of cached ones. Since we can exactly cover the client request, we also no longer need to issue an extra pull roundtrip after the cache retrieval.

That slicing focus on producing ranges that:
  • Have a high chance to be reusable in a pull selecting similar changesets,
  • Gather most of the changesets in large bundles.

This caching strategy inherits the nice "stablerange" properties regarding repository growth
  • When a few changesets are appended to a repository, only a few ranges have to be added.
  • The overall number of ranges (and associated bundles) to create to represent all possible ranges has an O(N log(N)) complexity.

For example, here are the 15 ranges selected for a full clone of mozilla-central:

  • 262114 changesets
  • 30 changesets
  • 65536 changesets
  • 32741 changesets
  • 20 changesets
  • 7 changesets
  • 8192 changesets
  • 243 changesets
  • 13 changesets
  • 114 changesets
  • 14 changesets
  • 32 changesets
  • 16 changesets
  • 8 changesets
  • 1 changesets

If we only clone a subset of the repository, the larger ranges get reused (hg clone --rev -5000):
  • 262114 changesets found in caches
  • 30 changesets found in caches
  • 65536 changesets found in caches
  • 32741 changesets found in caches
  • 20 changesets found in caches
  • 7 changesets found in caches
  • 2048 changesets found
  • 1024 changesets found
  • 482 changesets found
  • 30 changesets found
  • 32 changesets found
  • 1 changesets found
  • 7 changesets found
  • 4 changesets found
  • 2 changesets found
  • 1 changesets found
As you can see, the larger ranges of this second pull are common with the previous pull, allowing to reuse cached bundles.

The prototype is available in a small "pullbundle" extension. It focuses on the slicing itself and we did not implement anything fancy for the cache storage and delivery. We simply store generated bundle on disk and we read it from disk when it is needed again. Others, like Joerg Sonnenberger or Gregory Szorc, are already working on the "cache delivery" problem.

We are getting good result our of that prototypes when testing it on clones of mozilla-central and netbsd-src. See "Example Result" section for detail.

The prototype is up and running on our hgweb "mirror" instance https://mirror.octobus.net/.

The extension comes with a small debug command that produces statistic of the ranges that multiple random pulls would use.

The "stablerange" implementation currently still[1] live in the evolve extensions, so we put the extensions in the same repository for simplicity as "pullbundle". This is not ideal but was a simple solution in the time we could dedicate. This extension is not part of any official release yet. To test it you have to install it from the repository for now: https://www.mercurial-scm.org/repo/evolve/#default


The prototype performance is not stellar, but good enough to give useful result in a reasonable amount of time. A production-grade implementation of stablerange algorithm and storage will fix that. There is also room for improvement progression in the algorithm themselves, multiple sub-problem can be improved. We started having regular meeting with University researcher working on graph theory, they are interested in the problem space.

[1] The stablerange principle has been validating in the field and is ready to get upstreamed.

This experimentation is very useful and shows great promise!

Intelligent "slicing" of data in order to facilitate high cache hit rates and lower server work and client overhead is definitely worth pursuing.

FWIW I submitted my caching series for wire protocol version 2 a few hours ago. The meaningful patches are at https://phab.mercurial-scm.org/D4773 and https://phab.mercurial-scm.org/D4774. This work focuses on transparent whole-response caching, however, and does not (yet) implement a caching layer for e.g. storage access. I will note that I *really* want to enable Mercurial servers to offload as much data transfer to static file servers as possible.

An open item to solve is how we're going to facilitate bulk data transfer using fewer roundtrips in wire protocol version 2. Right now, the client has to send a *lot* of commands and data to the server (all the manifest and file nodes). We know we must do something better. I think this slicing idea could definitely be employed to facilitate higher cache hit rates. How, exactly, I'm not sure.

Something else to consider is that right now, an `hg clone` will preserve changelog order from server to client. This is trivial to change. And it is arguably an implementation detail of revlogs, which preserve insertion order. (And until https://www.mercurial-scm.org/repo/hg-committed/rev/db5501d93bcf it was the behavior for manifests and filelogs as well.) I'm not sure how important it is to preserve this property. But if we start slicing changesets, we could easily break this behavior.

This email caused me to consider the possibility that the "content redirects" feature proposed in D4774 should (eventually) support multiple locations. Then, a server could slice a large response into multiple payloads and have the client retrieve them separately.

Alternatively, since wire protocol version 2 currently performs a lot of granular requests, perhaps the server could inform the client of preferred slicing for data fetching and the client could follow up accordingly. How this would interact with bulk data transfer commands, I'm not sure.

There's definitely a lot to think about and I hope this slicing experimentation finds a home in core someday!
 

Example results.


The extensions come with a command to simulate multiple pulls of a random set of revisions (from a larger set of revision we define). This starts with a cold cache for simplicity.

Mozilla Central


We gathering 100 sample pulls within 20443 revisions
  • Median pull size: 18 338
  • Median number of changegroup used: 132 changegroups.
  • Median changeset not cached: 88
  • Median ratio of changeset already in the cache: 99.5%

The number of different ranges stays under control as expected:
  • Total changeset served: 1 817 955
  • Total changeset cache hit ratio: 96% (from a cold cache)
  • Distinct range cached: 12 983, Most of them very small (90% ≤ 32 changesets)
  • A small number of (larger) ranges get most of the cache hit.

Providing the smaller range from cache might not be a good tradeoff. If we skip using the cache for smaller ranges we still get interesting results:

Only caching range containing 256 changeset or more:

  • Median pull size: 18 940
  • Median number of changegroup used: 12
  • Median changeset not cached: 1 949
  • Median ratio of changeset already in the cache: 90%
  • Total changeset served: 1 850 243
  • Total changeset cache hit ratio: 87%
  • Distinct range cached: 1 150

Another way to reduce the number of server bundle would be to do some "over serving": using bundle containing some common changesets.

(See the end of the email for full details)

netbsd


This time, we issues more random pull (1000) within a set a bit smaller set of 10 000 changesets.

This resulted in smaller pulls, that also show good results:

  • Median pull size: 1673
  • Median number of changegroup used: 51
  • Median changeset not cached: 50
  • Median ratio of changeset already in the cache: 99%
  • Total changeset served: 1601087
  • Total changeset cache hit ratio: 96%
  • Distinct range cached: 51751

Trying the same 256+ changesets limit on caching, we see a stronger impact. Probably because of the smaller pulls:

  • Median pull size: 1592
  • Median number of changegroup used: 2
  • Median changeset not cached: 663
  • Median ratio of changeset already in the cache: 46%
  • Total changeset served: 1 554 227
  • Total changeset cache hit ratio: 56%
  • Distinct range cached: 1 914

(See the end of the email for full details)

pypy


pypy testing was done using 1 000 pulls within 16687 changesets.


  • Median pull size: 11375
  • Median number of changegroup used: 1206
  • Median changeset not cached: 12
  • Median ratio of changeset already in the cache: 100%
  • Total changeset served: 11 167 863
  • Total changeset cache hit ratio: 99%
  • Distinct range cached: 1 139 537

Installing the 256+ changeset limits give less good results. This is probably the result of a shallower pull space and the amount of merge in the pypy repository. The pypy repository is significantly more branchy than the other ones, there is some known way we could improve stablerange partitioning in this cases (to produce larger ranges).

  • Median pull size: 11457
  • Median number of changegroup used: 9
  • Median changeset not cached: 7276
  • Median ratio of changeset already in the cache: 37%
  • Total changeset served: 11211093
  • Total changeset cache hit ratio: 36%
  • Distinct range cached: 8964

Full details:


mozilla central, 100 pull in 20553 revisions, no limit

    mozilla-central> hg debugpullbundlecacheoverlap --count 100 -- 'tip~10000:' 
    gathering 100 sample pulls within 20443 revisions
    pull size:
      min: 13176
      10%: 16425
      25%: 17344
      50%: 18338
      75%: 19192
      90%: 19719
      95%: 19902
      max: 20272
    non-cached changesets:
      min: 4
      10%: 10
      25%: 24
      50%: 88
      75%: 237
      90%: 956
      95%: 3152
      max: 17440
    ratio of cached changesets:
      min: 0.0
      10%: 0.947941624918
      25%: 0.987343800064
      50%: 0.995036297078
      75%: 0.998774313882
      90%: 0.999421798208
      95%: 0.999634750848
      max: 0.999795354548
    bundle count:
      min: 74
      10%: 99
      25%: 113
      50%: 132
      75%: 146
      90%: 158
      95%: 169
      max: 186
    ratio of cached bundles:
      min: 0.0
      10%: 0.685082872928
      25%: 0.810810810811
      50%: 0.911392405063
      75%: 0.953020134228
      90%: 0.974683544304
      95%: 0.98125
      max: 0.993377483444
    changesets served:
      total:      1817955
      from cache: 1752642 (96%)
      bundle:       12983
    size of cached bundles:
      min: 1
      10%: 1
      25%: 2
      50%: 4
      75%: 9
      90%: 32
      95%: 64
      max: 8165
    hit on cached bundles:
      min: 1
      10%: 1
      25%: 1
      50%: 2
      75%: 4
      90%: 14
      95%: 20
      max: 100

mozilla central, 100 pull in 20443 revision, only caching ranges of 256 changeset and above

    mozilla-central > hg debugpullbundlecacheoverlap --count 100 --min-cache=256 -- 'tip~10000:'

    gathering 100 sample pulls within 20443 revisions

      not caching ranges smaller than 256 changesets

    pull size:

      min: 14060

      10%: 16910

      25%: 17923

      50%: 18940

      75%: 19471

      90%: 19884

      95%: 20029

      max: 20309

    non-cached changesets:

      min: 973

      10%: 1398

      25%: 1707

      50%: 1949

      75%: 2246

      90%: 2590

      95%: 3448

      max: 19551

    ratio of cached changesets:

      min: 0.0

      10%: 0.839908649729

      25%: 0.884512085944

      50%: 0.897293365889

      75%: 0.91018907563

      90%: 0.926944971537

      95%: 0.935139218649

      max: 0.946839315959

    bundle count:

      min: 4

      10%: 10

      25%: 11

      50%: 12

      75%: 12

      90%: 13

      95%: 15

      max: 16

    ratio of cached bundles:

      min: 0.0

      10%: 0.909090909091

      25%: 1.0

      50%: 1.0

      75%: 1.0

      90%: 1.0

      95%: 1.0

      max: 1.0

    changesets served:

      total:      1850243

      from cache: 1617379 (87%)

      bundle:        1150

    size of cached bundles:

      min: 256

      10%: 256

      25%: 256

      50%: 512

      75%: 1024

      90%: 1024

      95%: 1024

      max: 8165

    hit on cached bundles:

      min: 1

      10%: 1

      25%: 1

      50%: 7

      75%: 44

      90%: 44

      95%: 59

      max: 98

netbsd-src 1000 pull within 10000 revisions:

    netbsd-src > hg debugpullbundlecacheoverlap --count 1000 -- '-10000:'

    gathering 1000 sample pulls within 10000 revisions

    pull size:

      min: 10

      10%: 339

      25%: 865

      50%: 1673

      75%: 2330

      90%: 2752

      95%: 2893

      max: 3466

    non-cached changesets:

      min: 0

      10%: 3

      25%: 7

      50%: 16

      75%: 50

      90%: 137

      95%: 239

      max: 2787

    ratio of cached changesets:

      min: 0.0

      10%: 0.781553398058

      25%: 0.940663176265

      50%: 0.987631184408

      75%: 0.996178830722

      90%: 0.99843688941

      95%: 0.998939554613

      max: 1.0

    bundle count:

      min: 10

      10%: 28

      25%: 37

      50%: 51

      75%: 65

      90%: 78

      95%: 88

      max: 121

    ratio of cached bundles:

      min: 0.0

      10%: 0.446808510638

      25%: 0.673076923077

      50%: 0.826086956522

      75%: 0.901639344262

      90%: 0.942857142857

      95%: 0.96

      max: 1.0

    changesets served:

      total:      1601087

      from cache: 1539491 (96%)

      bundle:       51751

    size of cached bundles:

      min: 1

      10%: 1

      25%: 1

      50%: 1

      75%: 4

      90%: 8

      95%: 16

      max: 2048

    hit on cached bundles:

      min: 1

      10%: 1

      25%: 1

      50%: 2

      75%: 3

      90%: 8

      95%: 13

      max: 291

netbsd-src 1000 pull within 10000 revisions, not caching range smaller than 256:

    netbsd-src > hg debugpullbundlecacheoverlap --count 1000 --min-cache=256 -- '-10000:'

    gathering 1000 sample pulls within 10000 revisions

      not caching ranges smaller than 256 changesets

    pull size:

      min: 10

      10%: 329

      25%: 813

      50%: 1592

      75%: 2271

      90%: 2745

      95%: 2922

      max: 3719

    non-cached changesets:

      min: 10

      10%: 265

      25%: 440

      50%: 663

      75%: 911

      90%: 1111

      95%: 1229

      max: 2852

    ratio of cached changesets:

      min: 0.0

      10%: 0.0

      25%: 0.136752136752

      50%: 0.461261261261

      75%: 0.686327077748

      90%: 0.792263056093

      95%: 0.829552819183

      max: 0.959700093721

    bundle count:

      min: 0

      10%: 0

      25%: 1

      50%: 2

      75%: 3

      90%: 4

      95%: 4

      max: 6

    ratio of cached bundles:

      min: 0.0

      10%: 1.0

      25%: 1.0

      50%: 1.0

      75%: 1.0

      90%: 1.0

      95%: 1.0

      max: 1.0

    changesets served:

      total:      1554227

      from cache:  871680 (56%)

      bundle:        1914

    size of cached bundles:

      min: 256

      10%: 256

      25%: 256

      50%: 256

      75%: 512

      90%: 512

      95%: 512

      max: 2048

    hit on cached bundles:

      min: 2

      10%: 3

      25%: 7

      50%: 61

      75%: 113

      90%: 113

      95%: 117

      max: 267

pypy 1000 pulls within 16687 changesets:

    pypy > time hg debugpullbundlecacheoverlap --count 1000 -- 'tip~2000:'

    gathering 1000 sample pulls within 16687 revisions

    pull size:

      min: 5835

      10%: 9165

      25%: 10323

      50%: 11375

      75%: 12248

      90%: 12904

      95%: 13181

      max: 14221

    non-cached changesets:

      min: 0

      10%: 1

      25%: 4

      50%: 12

      75%: 39

      90%: 142

      95%: 453

      max: 12046

    ratio of cached changesets:

      min: 0.0

      10%: 0.986539780521

      25%: 0.99640167364

      50%: 0.99889963059

      75%: 0.99964598637

      90%: 0.999911496593

      95%: 1.0

      max: 1.0

    bundle count:

      min: 183

      10%: 762

      25%: 1045

      50%: 1206

      75%: 1308

      90%: 1387

      95%: 1427

      max: 1631

    ratio of cached bundles:

      min: 0.0

      10%: 0.972310126582

      25%: 0.990171990172

      50%: 0.995529061103

      75%: 0.997882851094

      90%: 0.999203821656

      95%: 1.0

      max: 1.0

    changesets served:

      total:      11167863

      from cache: 11060195 (99%)

      bundle:     1139537

    size of cached bundles:

      min: 1

      10%: 1

      25%: 1

      50%: 2

      75%: 4

      90%: 15

      95%: 30

      max: 2041

    hit on cached bundles:

      min: 1

      10%: 1

      25%: 1

      50%: 3

      75%: 20

      90%: 245

      95%: 848

      max: 999

pypy 1000 pulls within 16687 changesets:, caching above 256 changesets only:


    time hg debugpullbundlecacheoverlap --count 1000  --min-cache=256 -- 'tip~2000:'

    gathering 1000 sample pulls within 16687 revisions

      not caching ranges smaller than 256 changesets

    pull size:

      min: 3629

      10%: 9075

      25%: 10278

      50%: 11457

      75%: 12325

      90%: 12961

      95%: 13245

      max: 14330

    non-cached changesets:

      min: 2605

      10%: 5885

      25%: 6619

      50%: 7276

      75%: 7813

      90%: 8319

      95%: 8577

      max: 11815

    ratio of cached changesets:

      min: 0.0

      10%: 0.296190172981

      25%: 0.344310129221

      50%: 0.368110984417

      75%: 0.391840607211

      90%: 0.417344173442

      95%: 0.445362934971

      max: 0.544580009385

    bundle count:

      min: 1

      10%: 7

      25%: 8

      50%: 9

      75%: 10

      90%: 11

      95%: 11

      max: 12

    ratio of cached bundles:

      min: 0.0

      10%: 1.0

      25%: 1.0

      50%: 1.0

      75%: 1.0

      90%: 1.0

      95%: 1.0

      max: 1.0

    changesets served:

      total:      11211093

      from cache: 4066703 (36%)

      bundle:        8964

    size of cached bundles:

      min: 256

      10%: 256

      25%: 256

      50%: 468

      75%: 510

      90%: 1019

      95%: 512

      max: 1916

    hit on cached bundles:

      min: 1

      10%: 1

      25%: 3

      50%: 13

      75%: 63

      90%: 823

      95%: 153

      max: 958

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: caching pull - stable partitioning of bundle requests

Boris FELD-2


On 27/09/2018 18:15, Gregory Szorc wrote:
On Wed, Sep 26, 2018 at 11:13 AM Boris FELD <[hidden email]> wrote:
Hi everyone,

Pulling from a server involves expensive server-side computation that we wish to cache. However, since the client can pull any arbitrary set of revision, grouping and dispatching the data to be cached is a hard problem.

When we implemented the new discovery for obsolescence markers, we developed a "stablerange" method to build an efficient way to slice the changesets graph into ranges. In addition to solving the obsolescence markers discovery problem, this "stablerange" principle seemed to be useful for more usages, in particular, the caching of pulls.

Right now, with the current pull bundle implementation, here is how it work: you manually create and manually declare bundles containing either all changesets (that could also be used for clone bundles) or more specific ones. When the client request some changesets, the server searches a bundle containing the needed range and send it. This often involves more than the requested data. The client needs to filter out the extraneous data. Then the client does a discovery to catch any missing changesets from the bundle. If the server doesn't find a valid pull bundle, a normal discovery is done. The manual bundle managements is suboptimal, the search for appropriate bundles has a bad complexity and the extra roundtrip and discovery adds extra slowness.

This weekend, we build a "simple" prototype that use "stablerange" to slice changegroup request in "getbundle" into multiple bundles that can be reused from one pull to another. That slicing happens as part of a normal pull, during the getbundle call and after the normal discovery happens. There are no needs for an extra discovery and getbundle call after it.

With this "stablerange" based strategy, we start from the set of requested changesets to generate a set of "standard" range covering all of them. This slicing has a good algorithmic complexity that depends on the size of the selected "missing" set of changesets. So the associated cost of will scale well with the size of the associated pull. In addition, we no longer have to do an expensive search into a list existing bundles. This helps to scale small pulls and increase the number of bundles we can cache, as the time we spend selecting bundle no longer depends on the numbers of cached ones. Since we can exactly cover the client request, we also no longer need to issue an extra pull roundtrip after the cache retrieval.

That slicing focus on producing ranges that:
  • Have a high chance to be reusable in a pull selecting similar changesets,
  • Gather most of the changesets in large bundles.

This caching strategy inherits the nice "stablerange" properties regarding repository growth
  • When a few changesets are appended to a repository, only a few ranges have to be added.
  • The overall number of ranges (and associated bundles) to create to represent all possible ranges has an O(N log(N)) complexity.

For example, here are the 15 ranges selected for a full clone of mozilla-central:

  • 262114 changesets
  • 30 changesets
  • 65536 changesets
  • 32741 changesets
  • 20 changesets
  • 7 changesets
  • 8192 changesets
  • 243 changesets
  • 13 changesets
  • 114 changesets
  • 14 changesets
  • 32 changesets
  • 16 changesets
  • 8 changesets
  • 1 changesets

If we only clone a subset of the repository, the larger ranges get reused (hg clone --rev -5000):
  • 262114 changesets found in caches
  • 30 changesets found in caches
  • 65536 changesets found in caches
  • 32741 changesets found in caches
  • 20 changesets found in caches
  • 7 changesets found in caches
  • 2048 changesets found
  • 1024 changesets found
  • 482 changesets found
  • 30 changesets found
  • 32 changesets found
  • 1 changesets found
  • 7 changesets found
  • 4 changesets found
  • 2 changesets found
  • 1 changesets found
As you can see, the larger ranges of this second pull are common with the previous pull, allowing to reuse cached bundles.

The prototype is available in a small "pullbundle" extension. It focuses on the slicing itself and we did not implement anything fancy for the cache storage and delivery. We simply store generated bundle on disk and we read it from disk when it is needed again. Others, like Joerg Sonnenberger or Gregory Szorc, are already working on the "cache delivery" problem.

We are getting good result our of that prototypes when testing it on clones of mozilla-central and netbsd-src. See "Example Result" section for detail.

The prototype is up and running on our hgweb "mirror" instance https://mirror.octobus.net/.

The extension comes with a small debug command that produces statistic of the ranges that multiple random pulls would use.

The "stablerange" implementation currently still[1] live in the evolve extensions, so we put the extensions in the same repository for simplicity as "pullbundle". This is not ideal but was a simple solution in the time we could dedicate. This extension is not part of any official release yet. To test it you have to install it from the repository for now: https://www.mercurial-scm.org/repo/evolve/#default


The prototype performance is not stellar, but good enough to give useful result in a reasonable amount of time. A production-grade implementation of stablerange algorithm and storage will fix that. There is also room for improvement progression in the algorithm themselves, multiple sub-problem can be improved. We started having regular meeting with University researcher working on graph theory, they are interested in the problem space.

[1] The stablerange principle has been validating in the field and is ready to get upstreamed.

This experimentation is very useful and shows great promise!

Intelligent "slicing" of data in order to facilitate high cache hit rates and lower server work and client overhead is definitely worth pursuing.

FWIW I submitted my caching series for wire protocol version 2 a few hours ago. The meaningful patches are at https://phab.mercurial-scm.org/D4773 and https://phab.mercurial-scm.org/D4774. This work focuses on transparent whole-response caching, however, and does not (yet) implement a caching layer for e.g. storage access. I will note that I *really* want to enable Mercurial servers to offload as much data transfer to static file servers as possible.

An open item to solve is how we're going to facilitate bulk data transfer using fewer roundtrips in wire protocol version 2. Right now, the client has to send a *lot* of commands and data to the server (all the manifest and file nodes). We know we must do something better. I think this slicing idea could definitely be employed to facilitate higher cache hit rates. How, exactly, I'm not sure.

Something else to consider is that right now, an `hg clone` will preserve changelog order from server to client. This is trivial to change. And it is arguably an implementation detail of revlogs, which preserve insertion order. (And until https://www.mercurial-scm.org/repo/hg-committed/rev/db5501d93bcf it was the behavior for manifests and filelogs as well.) I'm not sure how important it is to preserve this property. But if we start slicing changesets, we could easily break this behavior.

This email caused me to consider the possibility that the "content redirects" feature proposed in D4774 should (eventually) support multiple locations. Then, a server could slice a large response into multiple payloads and have the client retrieve them separately.

Alternatively, since wire protocol version 2 currently performs a lot of granular requests, perhaps the server could inform the client of preferred slicing for data fetching and the client could follow up accordingly. How this would interact with bulk data transfer commands, I'm not sure.

There's definitely a lot to think about and I hope this slicing experimentation finds a home in core someday!
This slicing is based on the "stablerange" algorithm. The same as the one used to bisect obsmarkers during obsmarkers discovery.

The implementation of "stablerange" currently in use (in evolve) is a "research implementation". It is the result of the initial searching for a solution to stable slicing with good properties. Work on it was not pushed further than proving the approach had such properties. It turns out the current implementation is "good enough" to provide usable solutions in some area where the alternatives ones were severely lacking (eg: pullbundle slicing, obsmarkers discovery, ...). However, that implementation needs to be made "production-grade" before we can activate it by default. The goal is to be as fast as today for most commands (commit for example) and faster for exchange (obs-markers/pullbundle). In particular, cache size and overall performance can be improved.

So, overall we have a good solution that should move upstream. The overall algorithm is clear so it is mostly "straightforward" engineering work to perform. That said, the amount of engineering work is significant. There are three high-level items to be completed:

    - the basic caches should get a unified and simpler implementation (probably integrated to changelog-v3 for simplicity),

    - the stable range cache itself: will get a better storage and will also store less data,

    - If needed, a compiled implementation will boost performance to the point we don't have to worries about them. (maybe in Rust?)


The stable range computation uses a couple of data inherent to each node that we have to cache. The two simplest one could just go into the changelog index:

    - the depth of a change, ie: `len(::X)`.

    - first merge ancestors, ie: `max(merge() and ::X)`

Stable range also relies on a stable sort algorithm, which requires the caching of some data for each merges nodes. This could also be stored in the changelog (this time, the data parts).

Then, comes the actual stable range algorithm and related cache. It computes and stores details of how a given range get sliced into small ranges. Currently, we use sqlite to store this cache. This was a quick way to get storage running while experimenting. However, now, sqlite account for a large share of the current performance cost and size. So we should move away from it in favor of a simpler on-disk format.
In addition, we don't need to cache as much data. In most cases, slicing a range is trivial and won't require any complex computation. We should compute these on the fly instead of writing them to the cache. This will shrink the cache size significantly (in addition to gains from moving way from sqlite).

Finally, all this computation mostly deal with simple integers, tuples, and lists. An area where Python is notably sub-optimal. Writing this in a low-level language and giving it access to the low-level implementation of the changelog's index will provide a massive performance boost. The overall computational complexity of the stable-range algorithms is good, so given our previous experience with writing compiled version of such algorithm, we can expect performance concerns to goes away once we have it.


The road for moving this in Core is clear, but not short. So far we have not been able to free the necessary time to do it. Between the paying-client work, we have to do to pay salaries (and keep users happy) and all the time we are already investing in the community, we are already fairly busy.

In early 2017, Bitbucket gave $13, 500 to the Mercurial project to be spent to help evolution to move forward. As far as we know, this money is still unspent. Since stable range is a critical part of obsmarkers discovery, unlocking this money to be spent on upstreaming stable range would be a good idea (and fits its initial purposes). Paying for this kind of work will reduce the contention with client work and help us, or others, to dedicate time for it sooner than later.

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: caching pull - stable partitioning of bundle requests

Erik van Zijst
On Thu, Oct 4, 2018 at 8:16 AM Boris FELD <[hidden email]> wrote:
> This slicing is based on the "stablerange" algorithm. The same as the one used to bisect obsmarkers during obsmarkers discovery.
>
> The road for moving this in Core is clear, but not short. So far we have not been able to free the necessary time to do it. Between the paying-client work, we have to do to pay salaries (and keep users happy) and all the time we are already investing in the community, we are already fairly busy.
>
> In early 2017, Bitbucket gave $13, 500 to the Mercurial project to be spent to help evolution to move forward. As far as we know, this money is still unspent. Since stable range is a critical part of obsmarkers discovery, unlocking this money to be spent on upstreaming stable range would be a good idea (and fits its initial purposes). Paying for this kind of work will reduce the contention with client work and help us, or others, to dedicate time for it sooner than later.

Safe rebasing remains a priority for Bitbucket and the biggest reason
Mercurial has lost feature parity with Git on Bitbucket over the past
years as we added online merge strategies, squashing and rebasing.

While smf worked hard to add Evolve as an experimental labs feature,
along with pull request quash/rebasing, we sadly had to remove it
after customers opted in without properly realizing that Evolve is not
a Core feature and requires local installation of the extension across
all clients, plunging their workflows and repos in disarray.

Early access to non-core functionality is fine for skilled developers
thoroughly familiar with their tools, but has proved to be unsuitable
to average users and so as long as things don't land in Core they
remain out of reach for everyone. This was the motivation behind the
donation. To get Evolve, or at least the obsmarker exchange, into Core
and enabled by default. Only then can Bitbucket start to offer
rebasing workflows, regain parity with Git and position Mercurial as
the superior VCS it should be.

We sent the donation through the SFC to leave the final selection of
Evolve contributors at the discretion of the steering committee. We
had pressed for speedy allocation to those contributing materially to
Evolve's development, yet 18 months later it still has not been spent.
Even so, development of Evolve has seen substantial progress and it
seems fair to allocate the funds accordingly. The fact that the work
on obsmarker discovery is now being leveraged to extend clonebundles
to regular, non-clones pulls seems to further justify applying the
money towards this work.

Cheers,
Erik
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: caching pull - stable partitioning of bundle requests

Pulkit Goyal
In reply to this post by Boris FELD-2


On Thu, Oct 4, 2018 at 6:16 PM Boris FELD <[hidden email]> wrote:

The road for moving this in Core is clear, but not short. So far we have not been able to free the necessary time to do it. Between the paying-client work, we have to do to pay salaries (and keep users happy) and all the time we are already investing in the community, we are already fairly busy.


In early 2017, Bitbucket gave $13, 500 to the Mercurial project to be spent to help evolution to move forward. As far as we know, this money is still unspent. Since stable range is a critical part of obsmarkers discovery, unlocking this money to be spent on upstreaming stable range would be a good idea (and fits its initial purposes). Paying for this kind of work will reduce the contention with client work and help us, or others, to dedicate time for it sooner than later.

I definitely agree that obsmarker discovery is a critical part. Pulling from `hg-committed` is slower sometimes as compared to pulling on a repo (5-7x size of hg-committed) with server having thousands of heads.

Do you have any updates on the stable-range cache? In the current state the cache is pretty big and a lot of people faced problem with the cache size.  Also in case of strip or some other commands, it rebuilts the cache which takes more than 10 minutes on large repo which is definitely a bummer. Are you working on making that better and take less size? How is experimentation of evolve+stablerange going on?

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: caching pull - stable partitioning of bundle requests

Boris Feld
On 15/02/2019 21:16, Pulkit Goyal wrote:


On Thu, Oct 4, 2018 at 6:16 PM Boris FELD <[hidden email]> wrote:

The road for moving this in Core is clear, but not short. So far we have not been able to free the necessary time to do it. Between the paying-client work, we have to do to pay salaries (and keep users happy) and all the time we are already investing in the community, we are already fairly busy.


In early 2017, Bitbucket gave $13, 500 to the Mercurial project to be spent to help evolution to move forward. As far as we know, this money is still unspent. Since stable range is a critical part of obsmarkers discovery, unlocking this money to be spent on upstreaming stable range would be a good idea (and fits its initial purposes). Paying for this kind of work will reduce the contention with client work and help us, or others, to dedicate time for it sooner than later.

I definitely agree that obsmarker discovery is a critical part. Pulling from `hg-committed` is slower sometimes as compared to pulling on a repo (5-7x size of hg-committed) with server having thousands of heads.

Do you have any updates on the stable-range cache? In the current state the cache is pretty big and a lot of people faced problem with the cache size.  Also in case of strip or some other commands, it rebuilts the cache which takes more than 10 minutes on large repo which is definitely a bummer. Are you working on making that better and take less size? How is experimentation of evolve+stablerange going on?

# Regarding the cache-size:

We know that the current version caches many entries that are trivial to compute and does not need to be cached. In addition, the current storage (SQLite) does not seem very efficient.

So the next iteration of the cache should be significantly smaller.


# Regarding cache invalidation:

A lot of the data in the caches are an inherent property of the changeset and therefore immutable. It's easy to preserve then during strip to avoid having to recompute things from scratch. In addition, these immutable data should be exchange during pull alongside the associated changesets to avoid client recomputing the same data over and over.

The current implementation is an experimental/research implementation, all this should get smoothed directly in Core during the upstreaming.


# Regarding  cache-computation speed:

The current implementation is a "research" version written in Python, it is not geared toward efficiency and contains a lot of indirections that were helpful to reach the current solution but are now getting in the way of performance.

The initial implementation (in Evolve), focused on finding a solution with good scaling property (good computational and space complexity). However, we did not spend too much time improving the "constant" factor. Now that we know where we are headed we can have a much better implementation.

Once we have better on disk storage, native code and client/server exchange of most of the data, the impact of stable-range should get to a negligible level.


# Regarding what's next:

The experimental implementation cleared the unknown around stable-range computation and caching. However, even if the road is clear, a sizable amount of work remains, especially to move away from the unsuitable SQLite storage. We think that putting the Bitbucket donation to use is the best way to make sure this work gets done soon.

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: caching pull - stable partitioning of bundle requests

Pulkit Goyal


On Tue, Feb 19, 2019 at 3:23 PM Boris FELD <[hidden email]> wrote:
On 15/02/2019 21:16, Pulkit Goyal wrote:


On Thu, Oct 4, 2018 at 6:16 PM Boris FELD <[hidden email]> wrote:

The road for moving this in Core is clear, but not short. So far we have not been able to free the necessary time to do it. Between the paying-client work, we have to do to pay salaries (and keep users happy) and all the time we are already investing in the community, we are already fairly busy.


In early 2017, Bitbucket gave $13, 500 to the Mercurial project to be spent to help evolution to move forward. As far as we know, this money is still unspent. Since stable range is a critical part of obsmarkers discovery, unlocking this money to be spent on upstreaming stable range would be a good idea (and fits its initial purposes). Paying for this kind of work will reduce the contention with client work and help us, or others, to dedicate time for it sooner than later.

I definitely agree that obsmarker discovery is a critical part. Pulling from `hg-committed` is slower sometimes as compared to pulling on a repo (5-7x size of hg-committed) with server having thousands of heads.

Do you have any updates on the stable-range cache? In the current state the cache is pretty big and a lot of people faced problem with the cache size.  Also in case of strip or some other commands, it rebuilts the cache which takes more than 10 minutes on large repo which is definitely a bummer. Are you working on making that better and take less size? How is experimentation of evolve+stablerange going on?

# Regarding the cache-size:

We know that the current version caches many entries that are trivial to compute and does not need to be cached. In addition, the current storage (SQLite) does not seem very efficient.

So the next iteration of the cache should be significantly smaller.


# Regarding cache invalidation:

A lot of the data in the caches are an inherent property of the changeset and therefore immutable. It's easy to preserve then during strip to avoid having to recompute things from scratch. In addition, these immutable data should be exchange during pull alongside the associated changesets to avoid client recomputing the same data over and over.

The current implementation is an experimental/research implementation, all this should get smoothed directly in Core during the upstreaming.

I am bit confused when you say "things should get smoothed directly in Core during the upstreaming". Which one of the following did you mean:

1) send a patch of the current implementation to core, once that patch gets in, try to improve the implementation in core
2) send a series to core which contains patch of the current implementation and other patches improving the implementation

2) is same as what Augie did for narrow, linelog, remotefilelog, Greg did for sparse, I did for infinitepush.

Which one do you mean here?

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel