D21: rebase: rewrite core algorithm (issue5578) (issue5630)

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

D21: rebase: rewrite core algorithm (issue5578) (issue5630)

pulkit (Pulkit Goyal)
quark added inline comments.


> martinvonz wrote in rebase.py:392
> IIUC, the reason you no longer need to pass destancestors and obsoletenotrebased is because they are now calculated on the fly instead of upfront. It seems like that might be slower, but I haven't spent time thinking about under what circumstances it might be. Have you? Could it be noticeable even in the worst case you can think of, or would it be completely drowned by rebase's other costs (such as writing revlogs and working copy files)?

> IIUC, the reason you no longer need to pass destancestors and obsoletenotrebased is because they are now calculated on the f ly instead of upfront.


> It seems like that might be slower

Practically, not necessarily. The new code uses the C ancestor code in revlog.c, which is 20x faster than the Python ancestor code (https://phab.mercurial-scm.org/rHG5bae936764bb74aee13375a826e5da7b02990df1).

I think having less states here makes the code more maintainable.

For the speedup, I think the low-level `ancestor` could maintain a LRU ancestor cache and application could hint it what to cache. The cache object might be managed automatically (by the revlog.c code), or manually (by explicitly passing it to to `ancestor`). Besides, I had a non-trivial idea to make ancestor testing roughly `O(log N)` instead of `O(N)`. But those are non-trivial (involves native code) that I'd like to do them later.

> martinvonz wrote in rebase.py:1004
> Should this be a wrapper of cl.isancestor() instead? Looks like that's how you use it.

Line 1088 is not `isancestor`. I'm okay with having two functions here. Let me know what do you prefer.

> martinvonz wrote in rebase.py:1009
> nit: Is this for converting to a list or for making a copy (of something that's already a list)? If the latter, I feel like oldps[:}  is clearer.

Copy. Will change.

> martinvonz wrote in rebase.py:1010
> I think each step of the rebase effectively applies the diff in (rev - base) onto p1 (and p2 is only artificially added later to produce the graph we want), so I think we need to keep better track of the bases than using a set.
> Here's a test that will fail after this patch (the manifest will contain A as well):
>   $ hg debugdrawdag <<'EOF'
>   >   D
>   >  /|
>   > B C
>   >  \|
>   >   A E
>   >   |/
>   >   Q
>   > EOF
>   $ hg rebase -d E -r 'B + C + D'
>   $ hg manifest --rev tip
>   B
>   C
>   E
>   Q

Interesting. I didn't know that we treat p1 specially here. If so, users can always construct an asymmetric case that feels wrong. For example, if we use B or C as E's parent in the above graph, p1 may not get chosen in one of the case.

In this diff, it's Line 1091 trying to be conservative. I'll add tests and a warning like https://phab.mercurial-scm.org/D340 (Diff 770) line 1115.

> martinvonz wrote in rebase.py:1019
> This looks a bit weird. Would it work the same if you move it outside? Something like:
>   if oldps[1] == nullrev:
>     assert dests[1] == nullrev
>     newps = dests
>     if newps != oldps: # maybe always true?
>       bases.add(oldps[0])
>   else:
>     newps = oldps[:]
>     for i, p in enumerate(oldps):
>        ....
> Having written that and looked at adjustdest(), I see that the assertion would fail because adjustdest() would return [dest,dest] for a non-merge source. That seems surprising to me. Should adjustdest() be changed not to do that?

Maybe `assert i == 0` is a better assertion here. If `nullrev` is considered part of the DAG, then I think it makes sense for `adjustdest` to return `dest` for p2.

> martinvonz wrote in rebase.py:1025
> I don't quite understand the responsibility of adjustdest() vs this method. Why would or why would we not let adjustdest() handle the merge cases too instead of doing that here?

Good question. `adjustdest` was originally a replacement of `nearestrebased` and was only handling one destination. Then I found it has to deal with 2 parents differently so it became more complex.

Looking at its current usage at `clearrebased`, it's still different. `clearrebased` needs to just move bookmark to the adjusted destination. The code here need a double check before using that adjusted destination.

In theory, the code here could also just use the adjusted destination without checking. But as the comment above said, the merge commit may "lose a parent" surprisingly.

So I think the code here is to:

1. Decide which parent not to move (not obey the adjusted destination).
2. Handle the case when a parent is not moved to destination, will it still need to be moved?

For 2, an example is:

  B C   D

Rebasing A to D or E is an error since none of A's parents could be moved to those destinations. But if a user runs "rebase -s B+A -d D", then A needs to be moved to B'.

I'll add more comments here.

  rHG Mercurial


To: quark, durin42, #hg-reviewers
Cc: martinvonz, durin42, mercurial-devel
Mercurial-devel mailing list
[hidden email]