Quantcast

[PATCH 1 of 3] phases: add a getrevs method to phasecache

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 1 of 3] phases: add a getrevs method to phasecache

Jun Wu
# HG changeset patch
# User Jun Wu <[hidden email]>
# Date 1486731527 28800
#      Fri Feb 10 04:58:47 2017 -0800
# Node ID 1d7a184bb013a1a6f6b92d1f6d89406a7254ba2b
# Parent  dd14ece7cb20d8f3c764b17d2edc9ee646a27f2f
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r 1d7a184bb013
phases: add a getrevs method to phasecache

This is part of a refactoring that moves some phase query optimization from
revset.py to phases.py.

The motivation behind this is chg repo preloading - to preload the obsstore,
the calculation functions (obsolete.getrevs) need to be called. Currently those
functions use revsets (repo.revs), which is an overkill and requires a lot
of dependencies to be available on the repo object. The reality is, the
calculation only needs to query phase revisions. So let's make it use phases
instead of revset.

This patch adds a getrevs method so phasecache users could get a set of
revisions easily. Previously there is no API on phasecache to access
"_phasesets" and other code is accessing the private field "_phasesets"
directly.

diff --git a/mercurial/phases.py b/mercurial/phases.py
--- a/mercurial/phases.py
+++ b/mercurial/phases.py
@@ -171,4 +171,24 @@ class phasecache(object):
             self.opener = repo.svfs
 
+    def getrevs(self, repo, phases):
+        """return a set of revision numbers for the given phases"""
+        result = set()
+        slowphases = set(phases) # for slow path testing
+        self.loadphaserevs(repo) # ensure phase's sets are loaded
+
+        # fast path - use _phasesets
+        for phase in phases:
+            if self._phasesets and self._phasesets[phase] is not None:
+                result.update(self._phasesets[phase])
+                slowphases.remove(phase)
+        # slow path - enumerate all revisions
+        if slowphases:
+            result.update(r for r, p in enumerate(self._phaserevs)
+                          if p in slowphases)
+
+        if repo.changelog.filteredrevs:
+            result -= repo.changelog.filteredrevs
+        return result
+
     def copy(self):
         # Shallow copy meant to ensure isolation in
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 2 of 3] revset: use phasecache.getrevs

Jun Wu
# HG changeset patch
# User Jun Wu <[hidden email]>
# Date 1486735766 28800
#      Fri Feb 10 06:09:26 2017 -0800
# Node ID 44f2c707475a79758ecbc4b4117b9cda9d7a4380
# Parent  1d7a184bb013a1a6f6b92d1f6d89406a7254ba2b
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r 44f2c707475a
revset: use phasecache.getrevs

This is part of a refactoring that moves some phase query optimization from
revset.py to phases.py. See the previous patch for motivation.

This patch changes revset code to use phasecache.getrevs so it no longer
accesses private field: _phasecache.getrevs directly.

The change may make "not public()" slower in the pure version where there
are many non-public changesets, because of an unnecessary sort. But
performance should be unchanged for the default CPython version. Since
non-public changesets are not expected to be many, and most users use the
CPython version, the code clean-up seems worthy.

In the future, if we have a set-like C structured implemented by a bitmap,
we could get perf right - no longer need to use unordered Python set.

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -1645,17 +1645,10 @@ def parents(repo, subset, x):
     return subset & ps
 
-def _phase(repo, subset, target):
-    """helper to select all rev in phase <target>"""
-    repo._phasecache.loadphaserevs(repo) # ensure phase's sets are loaded
-    if repo._phasecache._phasesets:
-        s = repo._phasecache._phasesets[target] - repo.changelog.filteredrevs
-        s = baseset(s)
-        s.sort() # set are non ordered, so we enforce ascending
-        return subset & s
-    else:
-        phase = repo._phasecache.phase
-        condition = lambda r: phase(repo, r) == target
-        return subset.filter(condition, condrepr=('<phase %r>', target),
-                             cache=False)
+def _phase(repo, subset, *targets):
+    """helper to select all rev in <targets> phases"""
+    revs = repo._phasecache.getrevs(repo, targets)
+    s = baseset(revs)
+    s.sort()
+    return subset & s
 
 @predicate('draft()', safe=True)
@@ -1718,18 +1711,5 @@ def present(repo, subset, x):
 def _notpublic(repo, subset, x):
     getargs(x, 0, 0, "_notpublic takes no arguments")
-    repo._phasecache.loadphaserevs(repo) # ensure phase's sets are loaded
-    if repo._phasecache._phasesets:
-        s = set()
-        for u in repo._phasecache._phasesets[1:]:
-            s.update(u)
-        s = baseset(s - repo.changelog.filteredrevs)
-        s.sort()
-        return subset & s
-    else:
-        phase = repo._phasecache.phase
-        target = phases.public
-        condition = lambda r: phase(repo, r) != target
-        return subset.filter(condition, condrepr=('<phase %r>', target),
-                             cache=False)
+    return _phase(repo, subset, *phases.allphases[1:])
 
 @predicate('public()', safe=True)
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3 of 3] obsolete: avoid using revset to compute the obsolete set

Jun Wu
In reply to this post by Jun Wu
# HG changeset patch
# User Jun Wu <[hidden email]>
# Date 1486737071 28800
#      Fri Feb 10 06:31:11 2017 -0800
# Node ID a4a844d706453079ea69de94f5e1c6fec64710e9
# Parent  44f2c707475a79758ecbc4b4117b9cda9d7a4380
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r a4a844d70645
obsolete: avoid using revset to compute the obsolete set

This is part of a refactoring that moves some phase query optimization from
revset.py to phases.py. See previous patches for motivation.

Now we have APIs to get the non-public set efficiently, replace revset with
it.

diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -1121,5 +1121,5 @@ def _computeobsoleteset(repo):
     obs = set()
     getnode = repo.changelog.node
-    notpublic = repo.revs("not public()")
+    notpublic = repo._phasecache.getrevs(repo, phases.allphases[1:])
     for r in notpublic:
         if getnode(r) in repo.obsstore.successors:
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 2 of 3] revset: use phasecache.getrevs

Jun Wu
In reply to this post by Jun Wu
Excerpts from Jun Wu's message of 2017-02-10 06:34:18 -0800:

> # HG changeset patch
> # User Jun Wu <[hidden email]>
> # Date 1486735766 28800
> #      Fri Feb 10 06:09:26 2017 -0800
> # Node ID 44f2c707475a79758ecbc4b4117b9cda9d7a4380
> # Parent  1d7a184bb013a1a6f6b92d1f6d89406a7254ba2b
> # Available At https://bitbucket.org/quark-zju/hg-draft 
> #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r 44f2c707475a
> revset: use phasecache.getrevs
>
> This is part of a refactoring that moves some phase query optimization from
> revset.py to phases.py. See the previous patch for motivation.
>
> This patch changes revset code to use phasecache.getrevs so it no longer
> accesses private field: _phasecache.getrevs directly.
                          ^^^^^^^^^^^^^^^^^^^
                          This should be "_phasesets".

I probably want to add some hacky hooks to "hg email" to audit the commit
messages. Maybe just send them to some Text-To-Speech engine to double
confirm.

>
> The change may make "not public()" slower in the pure version where there
> are many non-public changesets, because of an unnecessary sort. But
> performance should be unchanged for the default CPython version. Since
> non-public changesets are not expected to be many, and most users use the
> CPython version, the code clean-up seems worthy.
>
> In the future, if we have a set-like C structured implemented by a bitmap,
> we could get perf right - no longer need to use unordered Python set.
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -1645,17 +1645,10 @@ def parents(repo, subset, x):
>      return subset & ps
>  
> -def _phase(repo, subset, target):
> -    """helper to select all rev in phase <target>"""
> -    repo._phasecache.loadphaserevs(repo) # ensure phase's sets are loaded
> -    if repo._phasecache._phasesets:
> -        s = repo._phasecache._phasesets[target] - repo.changelog.filteredrevs
> -        s = baseset(s)
> -        s.sort() # set are non ordered, so we enforce ascending
> -        return subset & s
> -    else:
> -        phase = repo._phasecache.phase
> -        condition = lambda r: phase(repo, r) == target
> -        return subset.filter(condition, condrepr=('<phase %r>', target),
> -                             cache=False)
> +def _phase(repo, subset, *targets):
> +    """helper to select all rev in <targets> phases"""
> +    revs = repo._phasecache.getrevs(repo, targets)
> +    s = baseset(revs)
> +    s.sort()
> +    return subset & s
>  
>  @predicate('draft()', safe=True)
> @@ -1718,18 +1711,5 @@ def present(repo, subset, x):
>  def _notpublic(repo, subset, x):
>      getargs(x, 0, 0, "_notpublic takes no arguments")
> -    repo._phasecache.loadphaserevs(repo) # ensure phase's sets are loaded
> -    if repo._phasecache._phasesets:
> -        s = set()
> -        for u in repo._phasecache._phasesets[1:]:
> -            s.update(u)
> -        s = baseset(s - repo.changelog.filteredrevs)
> -        s.sort()
> -        return subset & s
> -    else:
> -        phase = repo._phasecache.phase
> -        target = phases.public
> -        condition = lambda r: phase(repo, r) != target
> -        return subset.filter(condition, condrepr=('<phase %r>', target),
> -                             cache=False)
> +    return _phase(repo, subset, *phases.allphases[1:])
>  
>  @predicate('public()', safe=True)
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 2 of 3] revset: use phasecache.getrevs

Yuya Nishihara
In reply to this post by Jun Wu
On Fri, 10 Feb 2017 06:34:18 -0800, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <[hidden email]>
> # Date 1486735766 28800
> #      Fri Feb 10 06:09:26 2017 -0800
> # Node ID 44f2c707475a79758ecbc4b4117b9cda9d7a4380
> # Parent  1d7a184bb013a1a6f6b92d1f6d89406a7254ba2b
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r 44f2c707475a
> revset: use phasecache.getrevs

> The change may make "not public()" slower in the pure version where there
> are many non-public changesets, because of an unnecessary sort. But
> performance should be unchanged for the default CPython version. Since
> non-public changesets are not expected to be many, and most users use the
> CPython version, the code clean-up seems worthy.

Perhaps you can make getrevs() return a lazy smartset object so the pure
version should have the same performance characteristic as before. I mean
getrevs(repo, target) could be identical to _phase(repo, subset, target) and
_notpublic(repo, subset, x) given subset=fullreposet.

The direction of this series looks nice.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 2 of 3] revset: use phasecache.getrevs

Jun Wu
Excerpts from Yuya Nishihara's message of 2017-02-13 23:56:20 +0900:

> On Fri, 10 Feb 2017 06:34:18 -0800, Jun Wu wrote:
> > # HG changeset patch
> > # User Jun Wu <[hidden email]>
> > # Date 1486735766 28800
> > #      Fri Feb 10 06:09:26 2017 -0800
> > # Node ID 44f2c707475a79758ecbc4b4117b9cda9d7a4380
> > # Parent  1d7a184bb013a1a6f6b92d1f6d89406a7254ba2b
> > # Available At https://bitbucket.org/quark-zju/hg-draft 
> > #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r 44f2c707475a
> > revset: use phasecache.getrevs
>
> > The change may make "not public()" slower in the pure version where there
> > are many non-public changesets, because of an unnecessary sort. But
> > performance should be unchanged for the default CPython version. Since
> > non-public changesets are not expected to be many, and most users use the
> > CPython version, the code clean-up seems worthy.
>
> Perhaps you can make getrevs() return a lazy smartset object so the pure
> version should have the same performance characteristic as before. I mean
> getrevs(repo, target) could be identical to _phase(repo, subset, target) and
> _notpublic(repo, subset, x) given subset=fullreposet.
>
> The direction of this series looks nice.

Good advice! I didn't realize that smartset is decoupled from the repo
object.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 3 of 3] obsolete: avoid using revset to compute the obsolete set

Martin von Zweigbergk via Mercurial-devel
In reply to this post by Jun Wu
On Fri, Feb 10, 2017 at 6:34 AM, Jun Wu <[hidden email]> wrote:

> # HG changeset patch
> # User Jun Wu <[hidden email]>
> # Date 1486737071 28800
> #      Fri Feb 10 06:31:11 2017 -0800
> # Node ID a4a844d706453079ea69de94f5e1c6fec64710e9
> # Parent  44f2c707475a79758ecbc4b4117b9cda9d7a4380
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r a4a844d70645
> obsolete: avoid using revset to compute the obsolete set
>
> This is part of a refactoring that moves some phase query optimization from
> revset.py to phases.py. See previous patches for motivation.
>
> Now we have APIs to get the non-public set efficiently, replace revset with
> it.
>
> diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
> --- a/mercurial/obsolete.py
> +++ b/mercurial/obsolete.py
> @@ -1121,5 +1121,5 @@ def _computeobsoleteset(repo):
>      obs = set()
>      getnode = repo.changelog.node
> -    notpublic = repo.revs("not public()")
> +    notpublic = repo._phasecache.getrevs(repo, phases.allphases[1:])

nit: phases.allphases[1:] is also called phases.trackedphases, if you like

>      for r in notpublic:
>          if getnode(r) in repo.obsstore.successors:
> _______________________________________________
> Mercurial-devel mailing list
> [hidden email]
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 2 of 3] revset: use phasecache.getrevs

Jun Wu
In reply to this post by Yuya Nishihara
Excerpts from Yuya Nishihara's message of 2017-02-13 23:56:20 +0900:

> On Fri, 10 Feb 2017 06:34:18 -0800, Jun Wu wrote:
> > # HG changeset patch
> > # User Jun Wu <[hidden email]>
> > # Date 1486735766 28800
> > #      Fri Feb 10 06:09:26 2017 -0800
> > # Node ID 44f2c707475a79758ecbc4b4117b9cda9d7a4380
> > # Parent  1d7a184bb013a1a6f6b92d1f6d89406a7254ba2b
> > # Available At https://bitbucket.org/quark-zju/hg-draft 
> > #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r 44f2c707475a
> > revset: use phasecache.getrevs
>
> > The change may make "not public()" slower in the pure version where there
> > are many non-public changesets, because of an unnecessary sort. But
> > performance should be unchanged for the default CPython version. Since
> > non-public changesets are not expected to be many, and most users use the
> > CPython version, the code clean-up seems worthy.
>
> Perhaps you can make getrevs() return a lazy smartset object so the pure
> version should have the same performance characteristic as before. I mean
> getrevs(repo, target) could be identical to _phase(repo, subset, target) and
> _notpublic(repo, subset, x) given subset=fullreposet.

Actually, smartset.baseset will do "data = list(data)" if data is a set.
If we use baseset, the "set" to "list" conversion is an unwanted overhead.

Maybe we can be even smarter: lazily initializing baseset._list.

>
> The direction of this series looks nice.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Loading...