Quantcast

[PATCH 1 of 8] cache: make the cache updated callback easily accessible to extension

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 1 of 8] cache: make the cache updated callback easily accessible to extension

Pierre-Yves David-2
# HG changeset patch
# User Pierre-Yves David <[hidden email]>
# Date 1495192163 -7200
#      Fri May 19 13:09:23 2017 +0200
# Node ID 1fe18fcc122b4d8e14264a3670d2d5f99f11bd75
# Parent  8a87bfc5bebbbe0ac996ac8e047a029eb931af45
# EXP-Topic obscache
# Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
#              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r 1fe18fcc122b
cache: make the cache updated callback easily accessible to extension

This will help extension to benefit from this new logic. As a side effect this
clarify the 'transaction' method a little bit.

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -1091,10 +1091,7 @@ class localrepository(object):
                                **pycompat.strkwargs(hookargs))
             reporef()._afterlock(hook)
         tr.addfinalize('txnclose-hook', txnclosehook)
-        def warmscache(tr2):
-            repo = reporef()
-            repo.updatecaches(tr2)
-        tr.addpostclose('warms-cache', warmscache)
+        tr.addpostclose('warms-cache', self._buildcacheupdater(tr))
         def txnaborthook(tr2):
             """To be run if transaction is aborted
             """
@@ -1229,6 +1226,20 @@ class localrepository(object):
         self.destroyed()
         return 0
 
+    def _buildcacheupdater(self, newtransaction):
+        """called during transaction to build the callback updating cache
+
+        Lives on the repository to help extension who might want to augment
+        this logic. For this purpose, the created transaction is passed to the
+        method.
+        """
+        # we must avoid cyclic reference between repo and transaction.
+        reporef = weakref.ref(self)
+        def updater(tr):
+            repo = reporef()
+            repo.updatecaches(tr)
+        return updater
+
     @unfilteredmethod
     def updatecaches(self, tr=None):
         """warm appropriate caches
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 2 of 8] obsolete: move the 'isenabled' function at the top of the file

Pierre-Yves David-2
# HG changeset patch
# User Pierre-Yves David <[hidden email]>
# Date 1495192362 -7200
#      Fri May 19 13:12:42 2017 +0200
# Node ID 0aed424624f8c7905951d964579df2088b82abc8
# Parent  1fe18fcc122b4d8e14264a3670d2d5f99f11bd75
# EXP-Topic obscache
# Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
#              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r 0aed424624f8
obsolete: move the 'isenabled' function at the top of the file

That is a simple and important function so having it at the top next to the
related constant seems better.

diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -95,6 +95,27 @@ createmarkersopt = 'createmarkers'
 allowunstableopt = 'allowunstable'
 exchangeopt = 'exchange'
 
+def isenabled(repo, option):
+    """Returns True if the given repository has the given obsolete option
+    enabled.
+    """
+    result = set(repo.ui.configlist('experimental', 'evolution'))
+    if 'all' in result:
+        return True
+
+    # For migration purposes, temporarily return true if the config hasn't been
+    # set but _enabled is true.
+    if len(result) == 0 and _enabled:
+        return True
+
+    # createmarkers must be enabled if other options are enabled
+    if ((allowunstableopt in result or exchangeopt in result) and
+        not createmarkersopt in result):
+        raise error.Abort(_("'createmarkers' obsolete option must be enabled "
+                           "if other obsolete options are enabled"))
+
+    return option in result
+
 ### obsolescence marker flag
 
 ## bumpedfix flag
@@ -1261,24 +1282,3 @@ def createmarkers(repo, relations, flag=
         tr.close()
     finally:
         tr.release()
-
-def isenabled(repo, option):
-    """Returns True if the given repository has the given obsolete option
-    enabled.
-    """
-    result = set(repo.ui.configlist('experimental', 'evolution'))
-    if 'all' in result:
-        return True
-
-    # For migration purposes, temporarily return true if the config hasn't been
-    # set but _enabled is true.
-    if len(result) == 0 and _enabled:
-        return True
-
-    # createmarkers must be enabled if other options are enabled
-    if ((allowunstableopt in result or exchangeopt in result) and
-        not createmarkersopt in result):
-        raise error.Abort(_("'createmarkers' obsolete option must be enabled "
-                           "if other obsolete options are enabled"))
-
-    return option in result
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3 of 8] obsstore: add a 'cachekey' method

Pierre-Yves David-2
In reply to this post by Pierre-Yves David-2
# HG changeset patch
# User Pierre-Yves David <[hidden email]>
# Date 1495191830 -7200
#      Fri May 19 13:03:50 2017 +0200
# Node ID c67d3438f0d1ee56437ab0cffd49b02f441f876e
# Parent  0aed424624f8c7905951d964579df2088b82abc8
# EXP-Topic obscache
# Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
#              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r c67d3438f0d1
obsstore: add a 'cachekey' method

Parsing the full obsstore is slow, so cache that depends on obsstore content
wants a way to know if the obsstore changed, and it this change was append only.

For this purpose we introduce an official cachekey for the obsstore. This cache
key work in a way similar to the '(tiprev, tipnode)' pair used for the
changelog. We use the size of the obsstore file and the hash of its tail. That
way, we can check if the obsstore grew and if the content we knew is still
present in the obsstore.

This will be used in later changeset to cache related to the obsolete property.

diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -70,6 +70,7 @@ comment associated with each format for
 from __future__ import absolute_import
 
 import errno
+import hashlib
 import struct
 
 from .i18n import _
@@ -547,6 +548,8 @@ class obsstore(object):
     # parents: (tuple of nodeid) or None, parents of precursors
     #          None is used when no data has been recorded
 
+    _obskeysize = 200
+
     def __init__(self, svfs, defaultformat=_fm1version, readonly=False):
         # caches for various obsolescence related cache
         self.caches = {}
@@ -574,6 +577,46 @@ class obsstore(object):
 
     __bool__ = __nonzero__
 
+    def cachekey(self, index=None):
+        """return (current-length, cachekey)
+
+        'current-length': is the current length of the obsstore storage file,
+        'cachekey' is the hash of the last 200 bytes ending at 'index'.
+
+        If 'index' is unspecified, current obsstore length is used.
+        Cacheckey will be set to nullid if the obsstore is empty.
+        'current-lenght' is -always- the current obsstore length, regardless of
+        the 'index' value.
+
+        If the index specified is higher than the current obsstore file
+        length, cachekey will be set to None."""
+        # default value
+        obsstoresize = 0
+        keydata = ''
+        # try to get actual data from the obsstore
+        try:
+            with self.svfs('obsstore') as obsfile:
+                obsfile.seek(0, 2)
+                obsstoresize = obsfile.tell()
+                if index is None:
+                    index = obsstoresize
+                elif obsstoresize < index:
+                    return obsstoresize, None
+                actualsize = min(index, self._obskeysize)
+                if actualsize:
+                    obsfile.seek(index - actualsize, 0)
+                    keydata = obsfile.read(actualsize)
+        except (OSError, IOError) as e:
+            if e.errno != errno.ENOENT:
+                raise
+        if keydata:
+            key = hashlib.sha1(keydata).digest()
+        else:
+            # reusing an existing "empty" value make it easier to define a
+            # default cachekey for 'no data'.
+            key = node.nullid
+        return obsstoresize, key
+
     @property
     def readonly(self):
         """True if marker creation is disabled
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 4 of 8] obscache: add an abstract base class for changelog+obstore cache

Pierre-Yves David-2
In reply to this post by Pierre-Yves David-2
# HG changeset patch
# User Pierre-Yves David <[hidden email]>
# Date 1495194829 -7200
#      Fri May 19 13:53:49 2017 +0200
# Node ID b5c984cd0640255b94f22d56a4bfe398b2c5de2e
# Parent  c67d3438f0d1ee56437ab0cffd49b02f441f876e
# EXP-Topic obscache
# Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
#              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r b5c984cd0640
obscache: add an abstract base class for changelog+obstore cache

There are many useful properties to cache around obsolescence. Some of them
needs data from both the changelog and the obsstore. We needs logic handle to
track changes and strip to both data structure are the same time. We also wants
these to allow incremental update when possible instead of recomputing the whole
set all the time.

As this is a common needs, we start by adding an abstract class reusable by
multiple cache.

This code was initially part of the evolve extensions and matured there.

diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -1141,6 +1141,228 @@ def successorssets(repo, initialnode, ca
                 cache[current] = final
     return cache[initialnode]
 
+class dualsourcecache(object):
+    """An abstract class for cache that needs both changelog and obsstore
+
+    This class handle the tracking of changelog and obsstore update. It provide
+    data to performs incremental update (see the 'updatefrom' function for
+    details).  This class can also detect stripping of the changelog or the
+    obsstore and can reset the cache in this cache (see the 'clear' function
+    for details).
+
+    For this purpose it use a cache key covering changelog and obsstore
+    content:
+
+    The cache key parts are:
+    - tip-rev,
+    - tip-node,
+    - obsstore-length (nb markers),
+    - obsstore-file-size (in bytes),
+    - obsstore "cache key"
+    """
+
+    # default key used for an empty cache
+    emptykey = (node.nullrev, node.nullid, 0, 0, node.nullid)
+    _cachename = None # used for debug message
+
+    def __init__(self):
+        super(dualsourcecache, self).__init__()
+        self._cachekey = None
+
+    @property
+    def _tiprev(self):
+        """small utility to make the code less error prone"""
+        if self._cachekey is None:
+            return None
+        return self._cachekey[0]
+
+    @property
+    def _obssize(self):
+        """small utility to make the code less error prone"""
+        if self._cachekey is None:
+            return None
+        return self._cachekey[3]
+
+    def _updatefrom(self, repo, revs, obsmarkers):
+        """override this method to update your cache data incrementally
+
+        revs:      list of new revision in the changelog
+        obsmarker: list of new obsmarkers in the obsstore
+        """
+        raise NotImplementedError
+
+    def clear(self, reset=False):
+        """invalidate the cache content
+
+        if 'reset' is passed, we detected a strip and the cache will have to be
+        recomputed.
+
+        Subclasses MUST overide this method to actually affect the cache data.
+        """
+        if reset:
+            self._cachekey = self.emptykey if reset else None
+        else:
+            self._cachekey = None
+
+    def load(self, repo):
+        """Load data from disk
+
+        Subclasses MUST restore the "cachekey" attribute while doing so.
+        """
+        raise NotImplementedError
+
+    # Useful public function (no need to override them)
+
+    def uptodate(self, repo):
+        """return True if the cache content is up to date False otherwise
+
+        This method can be used to detect of the cache is lagging behind new
+        data in either changelog or obsstore.
+        """
+        repo = repo.unfiltered()
+        if self._cachekey is None:
+            self.load(repo)
+        status = self._checkkey(repo.changelog, repo.obsstore)
+        return (status is not None
+                and status[0] == self._tiprev
+                and status[1] == self._obssize)
+
+    def update(self, repo):
+        """update the cache with new repository data
+
+        The update will be incremental when possible"""
+        repo = repo.unfiltered()
+        # If we do not have any data, try loading from disk
+        if self._cachekey is None:
+            self.load(repo)
+
+        cl = repo.changelog
+
+        upgrade = self._upgradeneeded(repo)
+        if upgrade is None:
+            return
+
+        reset, revs, obsmarkers, obskeypair = upgrade
+        if reset or self._cachekey is None:
+            repo.ui.log('cache', 'strip detected, %s cache reset\n'
+                        % self._cachename)
+            self.clear(reset=True)
+
+        starttime = util.timer()
+        self._updatefrom(repo, revs, obsmarkers)
+        duration = util.timer() - starttime
+        repo.ui.log('cache', 'updated %s in %.4f seconds (%sr, %so)\n',
+                    self._cachename, duration, len(revs), len(obsmarkers))
+
+        # update the key from the new data
+        key = list(self._cachekey)
+        if revs:
+            # tiprev
+            key[0] = revs[-1]
+            # tipnode
+            key[1] = cl.node(key[0])
+        if obsmarkers:
+            # obsstore length (nb obsmarker)
+            key[2] += len(obsmarkers)
+            # obsstore size + key
+            key[3], key[4] = obskeypair
+        self._cachekey = tuple(key)
+
+    # from here, there are internal function only
+
+    def _checkkey(self, changelog, obsstore):
+        """key current key against changelog and obsstore
+
+        return:
+        - None: when the key is invalid (no key, strip detect, etc),
+        - (current-tiprev, current-obsszie, current-obskey) otherwise.
+        """
+        key = self._cachekey
+        if key is None:
+            return None
+
+        ### Is the cache valid ?
+        keytiprev, keytipnode, keyobslength, keyobssize, keyobskey = key
+        # check for changelog strip
+        tiprev = len(changelog) - 1
+        if (tiprev < keytiprev
+                or changelog.node(keytiprev) != keytipnode):
+            return None
+        # check for obsstore strip
+        obssize, obskey = obsstore.cachekey(index=keyobssize)
+        if obskey != keyobskey:
+            return None
+        if obssize != keyobssize:
+            # we want to return the obskey for the new size
+            __, obskey = obsstore.cachekey(index=obssize)
+        return tiprev, obssize, obskey
+
+    def _upgradeneeded(self, repo):
+        """return (reset, revs, markers, (obssize, obskey)) or None
+
+        If 'None' is returned, the cache is up to date, no upgrade are needed.
+
+        'reset': True if the current data must be destroyed (strip detected),
+
+        'revs': full list of new revisions that the cache must take in account,
+        'markers': full list of new markers the cache must take in account,
+        '(obssize, obskey)': the new obs-cachekey that match the new markers.
+
+        note: the new 'tiprev' can be computed using 'max(revs)' or revs[-1].
+        """
+
+        # We need to ensure we use the same changelog and obsstore through the
+        # processing. Otherwise some invalidation could update the object and
+        # their content after we computed the cache key.
+        cl = repo.changelog
+        obsstore = repo.obsstore
+
+        reset = False
+
+        status = self._checkkey(cl, obsstore)
+        if status is None:
+            reset = True
+            key = self.emptykey
+            obssize, obskey = obsstore.cachekey()
+            tiprev = len(cl) - 1
+        else:
+            key = self._cachekey
+            tiprev, obssize, obskey = status
+        keytiprev = key[0]
+        keyobslength = key[2]
+        keyobssize = key[3]
+
+        if not reset and keytiprev == tiprev and keyobssize == obssize:
+            return None # nothing to upgrade
+
+        ### cache is valid, is there anything to update
+
+        # any new changesets ?
+        revs = ()
+        if keytiprev < tiprev:
+            revs = list(cl.revs(start=keytiprev + 1, stop=tiprev))
+
+        # any new markers
+        markers = ()
+        if keyobssize < obssize:
+            # XXX Three are a small race change here. Since the obsstore might
+            # have move forward between the time we computed the cache key and
+            # we access the data. To fix this we need so "up to" argument when
+            # fetching the markers here. Otherwise we might return more markers
+            # than covered by the cache key.
+            #
+            # In practice the cache target usage only update in a transaction
+            # context, with the lock taken. So we should be fine.
+
+            # XXX we currently naively access all markers on the obsstore for
+            # the sake of simplicity. This will be updated to ways that does
+            # not requires to load the whole obsstore in the future.
+            markers = list(obsstore)
+            if keyobslength:
+                markers = markers[keyobslength:]
+
+        return reset, revs, markers, (obssize, obskey)
+
 # mapping of 'set-name' -> <function to compute this set>
 cachefuncs = {}
 def cachefor(name):
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 5 of 8] obscache: add a cache for 1/2 of the "obsolete" property

Pierre-Yves David-2
In reply to this post by Pierre-Yves David-2
# HG changeset patch
# User Pierre-Yves David <[hidden email]>
# Date 1495197986 -7200
#      Fri May 19 14:46:26 2017 +0200
# Node ID c2b1c81b3c4cba8cc21b1a1b92f4e34e6aa1807a
# Parent  b5c984cd0640255b94f22d56a4bfe398b2c5de2e
# EXP-Topic obscache
# Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
#              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r c2b1c81b3c4c
obscache: add a cache for 1/2 of the "obsolete" property

Knowing if a changeset is obsolete requires two data:

1) is the change non-public,
2) is the changeset affected by any obsolescence markers.

The phase related data has some fast implementation already. However, the
obsmarkers based property currently requires to parse the whole obsstore, a slow
operation.

That information is monotonic (new changeset are not affected, and once they are
affected, they will be for ever), making it is easy to cache. We introduce a new
class dedicated to this information. That first implementation still needs to
parse the full obsstore when updating for the sake of simplicity. It will be
improved later to allow lighter upgrade.

The next changesets will put this new cache to use.

That code is coming from the evolve extension, were it matured. To keep this
changeset simple, there are a couple of improvement in the extension that will
be ported later.

diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -1363,6 +1363,138 @@ class dualsourcecache(object):
 
         return reset, revs, markers, (obssize, obskey)
 
+class obscache(dualsourcecache):
+    """cache "does a rev is the precursors of some obsmarkers" property
+
+    This is not directly holding the "is this revision obsolete" information,
+    because phases data gets into play here. However, it allow to compute the
+    "obsolescence" set without reading the obsstore content.
+
+    The cache use a bytearray to store that data and simply write it on disk
+    for storage.
+
+    Implementation note #1:
+
+      The obsstore is implementing only half of the transaction logic it
+      should. It properly record the starting point of the obsstore to allow
+      clean rollback. However it still write to the obsstore file directly
+      during the transaction. Instead it should be keeping data in memory and
+      write to a '.pending' file to make the data vailable for hooks.
+
+      This cache is not going further than what the obsstore is doing, so it
+      does not has any '.pending' logic. When the obsstore gains proper
+      '.pending' support, adding it to this cache should not be too hard. As
+      the flag always move from 0 to 1, we could have a second '.pending' cache
+      file to be read. If flag is set in any of them, the value is 1. For the
+      same reason, updating the file in place should be possible.
+
+    Implementation note #2:
+
+        Storage-wise, we could have a "start rev" to avoid storing useless
+        zero. That would be especially useful for the '.pending' overlay.
+    """
+
+    _filepath = 'cache/obscache-v01'
+    _headerformat = '>q20sQQ20s'
+
+    _cachename = 'obscache' # used for error message
+
+    def __init__(self, repo):
+        super(obscache, self).__init__()
+        self._ondiskkey = None
+        self._vfs = repo.vfs
+        self._data = bytearray()
+
+    def get(self, rev):
+        """return True if "rev" is used as "precursors for any obsmarkers
+
+        IMPORTANT: make sure the cache has been updated to match the repository
+        content before using it"""
+        return self._data[rev]
+
+    def clear(self, reset=False):
+        """invalidate the in memory cache content"""
+        super(obscache, self).clear(reset=reset)
+        self._data = bytearray()
+
+    def _updatefrom(self, repo, revs, obsmarkers):
+        if revs:
+            self._updaterevs(repo, revs)
+        if obsmarkers:
+            self._updatemarkers(repo, obsmarkers)
+
+    def _updaterevs(self, repo, revs):
+        """update the cache with new revisions
+
+        Newly added changesets might be affected by obsolescence markers we
+        already have locally. So we needs to have some global knowledge about
+        the markers to handle that question.
+
+        XXX performance note:
+
+        Right now this requires parsing all markers in the obsstore. We could
+        imagine using various optimisation (eg: another cache, network
+        exchange, etc).
+
+        A possible approach to this is to build a set of all nodes used as
+        precursors in `obsstore._obscandidate`. If markers are not loaded yet,
+        we could initialize it by doing a quick scan through the obsstore data
+        and filling a (pre-sized) set. Doing so would be much faster than
+        parsing all the obsmarkers since we would access less data, not create
+        any object beside the nodes and not have to decode any complex data.
+
+        For now we stick to the simpler approach of paying the
+        performance cost on new changesets.
+        """
+        node = repo.changelog.node
+        succs = repo.obsstore.successors
+        for r in revs:
+            val = int(node(r) in succs)
+            self._data.append(val)
+        cl = repo.changelog
+        assert len(self._data) == len(cl), (len(self._data), len(cl))
+
+    def _updatemarkers(self, repo, obsmarkers):
+        """update the cache with new markers"""
+        rev = repo.changelog.nodemap.get
+        for m in obsmarkers:
+            r = rev(m[0])
+            if r is not None:
+                self._data[r] = 1
+
+    def save(self, repo):
+        """save the data to disk
+
+        Format is pretty simple, we serialise the cache key and then drop the
+        bytearray.
+        """
+
+        # XXX we'll need some pending related logic when the obsstore get it
+
+        if self._cachekey is None or self._cachekey == self._ondiskkey:
+            return
+
+        cachefile = repo.vfs(self._filepath, 'w', atomictemp=True)
+        headerdata = struct.pack(self._headerformat, *self._cachekey)
+        cachefile.write(headerdata)
+        cachefile.write(self._data)
+        cachefile.close()
+
+    def load(self, repo):
+        """load data from disk"""
+        assert repo.filtername is None
+
+        data = repo.vfs.tryread(self._filepath)
+        if not data:
+            self._cachekey = self.emptykey
+            self._data = bytearray()
+        else:
+            headersize = struct.calcsize(self._headerformat)
+            self._cachekey = struct.unpack(self._headerformat,
+                                           data[:headersize])
+            self._data = bytearray(data[headersize:])
+        self._ondiskkey = self._cachekey
+
 # mapping of 'set-name' -> <function to compute this set>
 cachefuncs = {}
 def cachefor(name):
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 6 of 8] obsstore: pass a repository object for initialisation

Pierre-Yves David-2
In reply to this post by Pierre-Yves David-2
# HG changeset patch
# User Pierre-Yves David <[hidden email]>
# Date 1495197862 -7200
#      Fri May 19 14:44:22 2017 +0200
# Node ID 762eaca9a0234992c50b2929a91a0ba24b93bab4
# Parent  c2b1c81b3c4cba8cc21b1a1b92f4e34e6aa1807a
# EXP-Topic obscache
# Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
#              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r 762eaca9a023
obsstore: pass a repository object for initialisation

The cache will needs a repository object (to grab a 'vfs'), so we pass a repo object instead of
just the 'svfs' and we grab the 'svfs' from there.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -1217,8 +1217,7 @@ def perfloadmarkers(ui, repo):
 
     Result is the number of markers in the repo."""
     timer, fm = gettimer(ui)
-    svfs = getsvfs(repo)
-    timer(lambda: len(obsolete.obsstore(svfs)))
+    timer(lambda: len(obsolete.obsstore(repo)))
     fm.end()
 
 @command('perflrucachedict', formatteropts +
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -518,7 +518,7 @@ class localrepository(object):
         if defaultformat is not None:
             kwargs['defaultformat'] = defaultformat
         readonly = not obsolete.isenabled(self, obsolete.createmarkersopt)
-        store = obsolete.obsstore(self.svfs, readonly=readonly,
+        store = obsolete.obsstore(self, readonly=readonly,
                                   **kwargs)
         if store and readonly:
             self.ui.warn(
diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -550,10 +550,10 @@ class obsstore(object):
 
     _obskeysize = 200
 
-    def __init__(self, svfs, defaultformat=_fm1version, readonly=False):
+    def __init__(self, repo, defaultformat=_fm1version, readonly=False):
         # caches for various obsolescence related cache
         self.caches = {}
-        self.svfs = svfs
+        self.svfs = repo.svfs
         self._version = defaultformat
         self._readonly = readonly
 
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 7 of 8] obscache: instantiate the cache and keep it warm

Pierre-Yves David-2
In reply to this post by Pierre-Yves David-2
# HG changeset patch
# User Pierre-Yves David <[hidden email]>
# Date 1495198006 -7200
#      Fri May 19 14:46:46 2017 +0200
# Node ID f41a5dbded63f3c721456187b88c49cd90f5ad53
# Parent  762eaca9a0234992c50b2929a91a0ba24b93bab4
# EXP-Topic obscache
# Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
#              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r f41a5dbded63
obscache: instantiate the cache and keep it warm

We are not using it yet, but we make sure we have a cache and that we keep it up
to date after each transaction.

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -1258,6 +1258,9 @@ class localrepository(object):
             self.ui.debug('updating the branch cache\n')
             branchmap.updatecache(self.filtered('served'))
 
+        self.obsstore.obscache.update(self)
+        self.obsstore.obscache.save(self)
+
     def invalidatecaches(self):
 
         if '_tagscache' in vars(self):
diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -556,6 +556,7 @@ class obsstore(object):
         self.svfs = repo.svfs
         self._version = defaultformat
         self._readonly = readonly
+        self.obscache = obscache(repo)
 
     def __iter__(self):
         return iter(self._all)
diff --git a/tests/test-blackbox.t b/tests/test-blackbox.t
--- a/tests/test-blackbox.t
+++ b/tests/test-blackbox.t
@@ -59,10 +59,10 @@ clone, commit, pull
   added 1 changesets with 1 changes to 1 files
   (run 'hg update' to get a working copy)
   $ hg blackbox -l 6
-  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> pull
   1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> updated served branch cache in * seconds (glob)
   1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> wrote served branch cache with 1 labels and 2 nodes
-  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> 1 incoming changes - new heads: d02f48003e62
+  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> updated obscache in * seconds (3r, 0o) (glob)
+  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> 1 incoming changes - new heads: d02f48003e62 (glob)
   1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> pull exited 0 after * seconds (glob)
   1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> blackbox -l 6
 
@@ -121,8 +121,8 @@ backup bundles get logged
   0 files updated, 0 files merged, 1 files removed, 0 files unresolved
   saved backup bundle to $TESTTMP/blackboxtest2/.hg/strip-backup/*-backup.hg (glob)
   $ hg blackbox -l 6
-  1970/01/01 00:00:00 bob @73f6ee326b27d820b0472f1a825e3a50f3dc489b (5000)> strip tip
-  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> saved backup bundle to $TESTTMP/blackboxtest2/.hg/strip-backup/73f6ee326b27-7612e004-backup.hg (glob)
+  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> strip detected, obscache cache reset
+  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> updated obscache in * seconds (3r, 0o) (glob)
   1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> updated base branch cache in * seconds (glob)
   1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> wrote base branch cache with 1 labels and 2 nodes
   1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> strip tip exited 0 after * seconds (glob)
@@ -194,9 +194,9 @@ log rotation
   
   result: None
   $ hg blackbox
-  1970/01/01 00:00:00 bob @0e46349438790c460c5c9f7546bfcd39b267bbd2 (5000)> commit -m commit2 -d 2000-01-02 foo
   1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> updated served branch cache in * seconds (glob)
   1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> wrote served branch cache with 1 labels and 1 nodes
+  1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> updated obscache in * seconds (1r, 0o) (glob)
   1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> commit -m commit2 -d 2000-01-02 foo exited 0 after * seconds (glob)
   1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> log -r 0
   1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> writing .hg/cache/tags2-visible with 0 tags
diff --git a/tests/test-clone.t b/tests/test-clone.t
--- a/tests/test-clone.t
+++ b/tests/test-clone.t
@@ -35,6 +35,7 @@ Trigger branchcache creation:
   checklink (symlink !)
   checklink-target (symlink !)
   checknoexec (execbit !)
+  obscache-v01
   rbc-names-v1
   rbc-revs-v1
 
diff --git a/tests/test-debugcommands.t b/tests/test-debugcommands.t
--- a/tests/test-debugcommands.t
+++ b/tests/test-debugcommands.t
@@ -118,6 +118,7 @@ Test cache warming command
   $ ls -r .hg/cache/*
   .hg/cache/rbc-revs-v1
   .hg/cache/rbc-names-v1
+  .hg/cache/obscache-v01
   .hg/cache/branch2-served
 
   $ cd ..
diff --git a/tests/test-fncache.t b/tests/test-fncache.t
--- a/tests/test-fncache.t
+++ b/tests/test-fncache.t
@@ -86,6 +86,7 @@ Non store repo:
   .hg/00manifest.i
   .hg/cache
   .hg/cache/branch2-served
+  .hg/cache/obscache-v01
   .hg/cache/rbc-names-v1
   .hg/cache/rbc-revs-v1
   .hg/data
@@ -118,6 +119,7 @@ Non fncache repo:
   .hg/00changelog.i
   .hg/cache
   .hg/cache/branch2-served
+  .hg/cache/obscache-v01
   .hg/cache/rbc-names-v1
   .hg/cache/rbc-revs-v1
   .hg/dirstate
diff --git a/tests/test-hardlinks.t b/tests/test-hardlinks.t
--- a/tests/test-hardlinks.t
+++ b/tests/test-hardlinks.t
@@ -233,6 +233,7 @@ r4 has hardlinks in the working dir (not
   2 r4/.hg/cache/checkisexec
   2 r4/.hg/cache/checklink-target
   2 r4/.hg/cache/checknoexec
+  2 r4/.hg/cache/obscache-v01
   2 r4/.hg/cache/rbc-names-v1
   2 r4/.hg/cache/rbc-revs-v1
   2 r4/.hg/dirstate
@@ -282,6 +283,7 @@ Update back to revision 11 in r4 should
   2 r4/.hg/cache/checkisexec
   2 r4/.hg/cache/checklink-target
   2 r4/.hg/cache/checknoexec
+  2 r4/.hg/cache/obscache-v01
   2 r4/.hg/cache/rbc-names-v1
   2 r4/.hg/cache/rbc-revs-v1
   1 r4/.hg/dirstate
diff --git a/tests/test-inherit-mode.t b/tests/test-inherit-mode.t
--- a/tests/test-inherit-mode.t
+++ b/tests/test-inherit-mode.t
@@ -66,6 +66,7 @@ new directories are setgid
   00600 ./.hg/00changelog.i
   00770 ./.hg/cache/
   00660 ./.hg/cache/branch2-served
+  00660 ./.hg/cache/obscache-v01
   00660 ./.hg/cache/rbc-names-v1
   00660 ./.hg/cache/rbc-revs-v1
   00660 ./.hg/dirstate
@@ -115,6 +116,7 @@ group can still write everything
   00660 ../push/.hg/00changelog.i
   00770 ../push/.hg/cache/
   00660 ../push/.hg/cache/branch2-base
+  00660 ../push/.hg/cache/obscache-v01
   00660 ../push/.hg/dirstate
   00660 ../push/.hg/requires
   00770 ../push/.hg/store/
diff --git a/tests/test-tags.t b/tests/test-tags.t
--- a/tests/test-tags.t
+++ b/tests/test-tags.t
@@ -537,6 +537,7 @@ Strip 1: expose an old head:
 
   $ hg --config extensions.mq= strip 5
   saved backup bundle to $TESTTMP/t3/.hg/strip-backup/*-backup.hg (glob)
+  warning: ignoring unknown working parent 735c3ca72986!
   $ hg tags                  # partly stale cache
   tip                                5:735c3ca72986
   bar                                1:78391a272241
@@ -548,6 +549,7 @@ Strip 2: destroy whole branch, no old he
 
   $ hg --config extensions.mq= strip 4
   saved backup bundle to $TESTTMP/t3/.hg/strip-backup/*-backup.hg (glob)
+  warning: ignoring unknown working parent 735c3ca72986!
   $ hg tags                  # partly stale
   tip                                4:735c3ca72986
   bar                                0:bbd179dfa0a7
@@ -679,6 +681,7 @@ Missing tags2* files means the cache was
   checklink
   checklink-target
   hgtagsfnodes1
+  obscache-v01
 
 Cache should contain the head only, even though other nodes have tags data
 
@@ -707,6 +710,7 @@ Running hg tags should produce tags2* fi
   checklink
   checklink-target
   hgtagsfnodes1
+  obscache-v01
   tags2-visible
 
   $ f --size --hexdump tagsclient/.hg/cache/hgtagsfnodes1
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 8 of 8] obscache: use the obscache to compute the obsolete set

Pierre-Yves David-2
In reply to this post by Pierre-Yves David-2
# HG changeset patch
# User Pierre-Yves David <[hidden email]>
# Date 1495198021 -7200
#      Fri May 19 14:47:01 2017 +0200
# Node ID f1082c7b4a2eb56f9bddfabacc7c5b3fefd19c75
# Parent  f41a5dbded63f3c721456187b88c49cd90f5ad53
# EXP-Topic obscache
# Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
#              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r f1082c7b4a2e
obscache: use the obscache to compute the obsolete set

Now that we have a cache and that the cache is kept up to date, we can use it to
speeds up the obsolete set computation. This way, we no longer need to load the
obsstore for most operation.

On the mercurial-core repository, this provide a significant speed up:

Running "hg  id -r ."
- before: 0.630 second (0.56s user 0.06s system 99% cpu 0.630)
- after:  0.129 second (0.11s user 0.02s system 98% cpu 0.129)

The and obsstore loading operation disapear from execution profile.


To keep the changeset simple it the handling of case were
the cache has not been kept up to date is pretty simple. That might introduce a
small performance impact during the transition in some case. This will get
improved in later changeset.

In addition the cache still needs to parse the full obsstore when updating.
There as known way to skip parsing the full obsstore for wrote operation too.
This will also get improved later.

diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -1539,10 +1539,26 @@ def clearobscaches(repo):
 def _computeobsoleteset(repo):
     """the set of obsolete revisions"""
     obs = set()
-    getnode = repo.changelog.node
     notpublic = repo._phasecache.getrevset(repo, (phases.draft, phases.secret))
+    if not notpublic:
+        # all changeset are public, none are obsolete
+        return obs
+
+    # XXX There are a couple of case where the cache could not be up to date:
+    #
+    # 1) no transaction happened in the repository since the upgrade,
+    # 2) both old and new client touches that repository
+    #
+    # recomputing the whole cache in these case is a bit slower that using the
+    # good old version (parsing markers and checking them). We could add some
+    # logic to fall back to the old way in these cases.
+    obscache = repo.obsstore.obscache
+    obscache.update(repo) # ensure it is up to date:
+    isobs = obscache.get
+
+    # actually compute the obsolete set
     for r in notpublic:
-        if getnode(r) in repo.obsstore.successors:
+        if isobs(r):
             obs.add(r)
     return obs
 
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 4 of 8] obscache: add an abstract base class for changelog+obstore cache

Augie Fackler-2
In reply to this post by Pierre-Yves David-2
On Fri, May 19, 2017 at 04:28:03PM +0200, Pierre-Yves David wrote:

> # HG changeset patch
> # User Pierre-Yves David <[hidden email]>
> # Date 1495194829 -7200
> #      Fri May 19 13:53:49 2017 +0200
> # Node ID b5c984cd0640255b94f22d56a4bfe398b2c5de2e
> # Parent  c67d3438f0d1ee56437ab0cffd49b02f441f876e
> # EXP-Topic obscache
> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
> #              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r b5c984cd0640
> obscache: add an abstract base class for changelog+obstore cache
>
> There are many useful properties to cache around obsolescence. Some of them
> needs data from both the changelog and the obsstore. We needs logic handle to
> track changes and strip to both data structure are the same time. We also wants
> these to allow incremental update when possible instead of recomputing the whole
> set all the time.
>
> As this is a common needs, we start by adding an abstract class reusable by
> multiple cache.
>
> This code was initially part of the evolve extensions and matured there.
>
> diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
> --- a/mercurial/obsolete.py
> +++ b/mercurial/obsolete.py
> @@ -1141,6 +1141,228 @@ def successorssets(repo, initialnode, ca
>                  cache[current] = final
>      return cache[initialnode]
>
> +class dualsourcecache(object):

I'm not in love with this name, as it sounds more generic than it
is. I might also have some nits about it being an abstract class
depending on the future patches.

> +    """An abstract class for cache that needs both changelog and obsstore
> +
> +    This class handle the tracking of changelog and obsstore update. It provide
> +    data to performs incremental update (see the 'updatefrom' function for
> +    details).  This class can also detect stripping of the changelog or the
> +    obsstore and can reset the cache in this cache (see the 'clear' function
> +    for details).
> +
> +    For this purpose it use a cache key covering changelog and obsstore
> +    content:
> +
> +    The cache key parts are:
> +    - tip-rev,
> +    - tip-node,
> +    - obsstore-length (nb markers),
> +    - obsstore-file-size (in bytes),
> +    - obsstore "cache key"
> +    """
> +
> +    # default key used for an empty cache
> +    emptykey = (node.nullrev, node.nullid, 0, 0, node.nullid)
> +    _cachename = None # used for debug message
> +
> +    def __init__(self):
> +        super(dualsourcecache, self).__init__()
> +        self._cachekey = None
> +
> +    @property
> +    def _tiprev(self):
> +        """small utility to make the code less error prone"""
> +        if self._cachekey is None:
> +            return None
> +        return self._cachekey[0]
> +
> +    @property
> +    def _obssize(self):
> +        """small utility to make the code less error prone"""
> +        if self._cachekey is None:
> +            return None
> +        return self._cachekey[3]
> +
> +    def _updatefrom(self, repo, revs, obsmarkers):
> +        """override this method to update your cache data incrementally
> +
> +        revs:      list of new revision in the changelog
> +        obsmarker: list of new obsmarkers in the obsstore
> +        """
> +        raise NotImplementedError

Nit: could we move all the abstract parts of the interface to the top
of the class so it's easier to find what has to be implemented to get
a concrete implementation?

> +
> +    def clear(self, reset=False):
> +        """invalidate the cache content
> +
> +        if 'reset' is passed, we detected a strip and the cache will have to be
> +        recomputed.
> +
> +        Subclasses MUST overide this method to actually affect the cache data.
> +        """
> +        if reset:
> +            self._cachekey = self.emptykey if reset else None
> +        else:
> +            self._cachekey = None
> +
> +    def load(self, repo):
> +        """Load data from disk
> +
> +        Subclasses MUST restore the "cachekey" attribute while doing so.
> +        """
> +        raise NotImplementedError
> +
> +    # Useful public function (no need to override them)
> +
> +    def uptodate(self, repo):
> +        """return True if the cache content is up to date False otherwise
> +
> +        This method can be used to detect of the cache is lagging behind new
> +        data in either changelog or obsstore.
> +        """
> +        repo = repo.unfiltered()
> +        if self._cachekey is None:
> +            self.load(repo)
> +        status = self._checkkey(repo.changelog, repo.obsstore)
> +        return (status is not None
> +                and status[0] == self._tiprev
> +                and status[1] == self._obssize)
> +
> +    def update(self, repo):
> +        """update the cache with new repository data
> +
> +        The update will be incremental when possible"""
> +        repo = repo.unfiltered()
> +        # If we do not have any data, try loading from disk
> +        if self._cachekey is None:
> +            self.load(repo)
> +
> +        cl = repo.changelog
> +
> +        upgrade = self._upgradeneeded(repo)
> +        if upgrade is None:
> +            return
> +
> +        reset, revs, obsmarkers, obskeypair = upgrade
> +        if reset or self._cachekey is None:
> +            repo.ui.log('cache', 'strip detected, %s cache reset\n'
> +                        % self._cachename)
> +            self.clear(reset=True)
> +
> +        starttime = util.timer()
> +        self._updatefrom(repo, revs, obsmarkers)
> +        duration = util.timer() - starttime
> +        repo.ui.log('cache', 'updated %s in %.4f seconds (%sr, %so)\n',
> +                    self._cachename, duration, len(revs), len(obsmarkers))
> +
> +        # update the key from the new data
> +        key = list(self._cachekey)
> +        if revs:
> +            # tiprev
> +            key[0] = revs[-1]
> +            # tipnode
> +            key[1] = cl.node(key[0])
> +        if obsmarkers:
> +            # obsstore length (nb obsmarker)
> +            key[2] += len(obsmarkers)
> +            # obsstore size + key
> +            key[3], key[4] = obskeypair
> +        self._cachekey = tuple(key)
> +
> +    # from here, there are internal function only
> +
> +    def _checkkey(self, changelog, obsstore):
> +        """key current key against changelog and obsstore
> +
> +        return:
> +        - None: when the key is invalid (no key, strip detect, etc),
> +        - (current-tiprev, current-obsszie, current-obskey) otherwise.
> +        """
> +        key = self._cachekey
> +        if key is None:
> +            return None
> +
> +        ### Is the cache valid ?
> +        keytiprev, keytipnode, keyobslength, keyobssize, keyobskey = key
> +        # check for changelog strip
> +        tiprev = len(changelog) - 1
> +        if (tiprev < keytiprev
> +                or changelog.node(keytiprev) != keytipnode):
> +            return None
> +        # check for obsstore strip
> +        obssize, obskey = obsstore.cachekey(index=keyobssize)
> +        if obskey != keyobskey:
> +            return None
> +        if obssize != keyobssize:
> +            # we want to return the obskey for the new size
> +            __, obskey = obsstore.cachekey(index=obssize)
> +        return tiprev, obssize, obskey
> +
> +    def _upgradeneeded(self, repo):
> +        """return (reset, revs, markers, (obssize, obskey)) or None
> +
> +        If 'None' is returned, the cache is up to date, no upgrade are needed.
> +
> +        'reset': True if the current data must be destroyed (strip detected),
> +
> +        'revs': full list of new revisions that the cache must take in account,
> +        'markers': full list of new markers the cache must take in account,
> +        '(obssize, obskey)': the new obs-cachekey that match the new markers.
> +
> +        note: the new 'tiprev' can be computed using 'max(revs)' or revs[-1].
> +        """
> +
> +        # We need to ensure we use the same changelog and obsstore through the
> +        # processing. Otherwise some invalidation could update the object and
> +        # their content after we computed the cache key.
> +        cl = repo.changelog
> +        obsstore = repo.obsstore
> +
> +        reset = False
> +
> +        status = self._checkkey(cl, obsstore)
> +        if status is None:
> +            reset = True
> +            key = self.emptykey
> +            obssize, obskey = obsstore.cachekey()
> +            tiprev = len(cl) - 1
> +        else:
> +            key = self._cachekey
> +            tiprev, obssize, obskey = status
> +        keytiprev = key[0]
> +        keyobslength = key[2]
> +        keyobssize = key[3]
> +
> +        if not reset and keytiprev == tiprev and keyobssize == obssize:
> +            return None # nothing to upgrade
> +
> +        ### cache is valid, is there anything to update
> +
> +        # any new changesets ?
> +        revs = ()
> +        if keytiprev < tiprev:
> +            revs = list(cl.revs(start=keytiprev + 1, stop=tiprev))
> +
> +        # any new markers
> +        markers = ()
> +        if keyobssize < obssize:
> +            # XXX Three are a small race change here. Since the obsstore might
> +            # have move forward between the time we computed the cache key and
> +            # we access the data. To fix this we need so "up to" argument when
> +            # fetching the markers here. Otherwise we might return more markers
> +            # than covered by the cache key.
> +            #
> +            # In practice the cache target usage only update in a transaction
> +            # context, with the lock taken. So we should be fine.
> +
> +            # XXX we currently naively access all markers on the obsstore for
> +            # the sake of simplicity. This will be updated to ways that does
> +            # not requires to load the whole obsstore in the future.
> +            markers = list(obsstore)
> +            if keyobslength:
> +                markers = markers[keyobslength:]
> +
> +        return reset, revs, markers, (obssize, obskey)
> +
>  # mapping of 'set-name' -> <function to compute this set>
>  cachefuncs = {}
>  def cachefor(name):
> _______________________________________________
> Mercurial-devel mailing list
> [hidden email]
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 5 of 8] obscache: add a cache for 1/2 of the "obsolete" property

Augie Fackler-2
In reply to this post by Pierre-Yves David-2
On Fri, May 19, 2017 at 04:28:04PM +0200, Pierre-Yves David wrote:

> # HG changeset patch
> # User Pierre-Yves David <[hidden email]>
> # Date 1495197986 -7200
> #      Fri May 19 14:46:26 2017 +0200
> # Node ID c2b1c81b3c4cba8cc21b1a1b92f4e34e6aa1807a
> # Parent  b5c984cd0640255b94f22d56a4bfe398b2c5de2e
> # EXP-Topic obscache
> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
> #              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r c2b1c81b3c4c
> obscache: add a cache for 1/2 of the "obsolete" property

So, can you point me to other (upcoming) concrete subclasses of
dualsourcecache, or is this the only one?

Also, could we put these big new classes in obscache.py or some other
new file? I'm wary of adding more code to the already-1200-lines
obsolete.py, and since these are coming from evolve I'm hoping that
they're still easily separable from obsolete.py...

>
> Knowing if a changeset is obsolete requires two data:
>
> 1) is the change non-public,
> 2) is the changeset affected by any obsolescence markers.
>
> The phase related data has some fast implementation already. However, the
> obsmarkers based property currently requires to parse the whole obsstore, a slow
> operation.
>
> That information is monotonic (new changeset are not affected, and once they are
> affected, they will be for ever), making it is easy to cache. We introduce a new
> class dedicated to this information. That first implementation still needs to
> parse the full obsstore when updating for the sake of simplicity. It will be
> improved later to allow lighter upgrade.
>
> The next changesets will put this new cache to use.
>
> That code is coming from the evolve extension, were it matured. To keep this
> changeset simple, there are a couple of improvement in the extension that will
> be ported later.
>
> diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
> --- a/mercurial/obsolete.py
> +++ b/mercurial/obsolete.py
> @@ -1363,6 +1363,138 @@ class dualsourcecache(object):
>
>          return reset, revs, markers, (obssize, obskey)
>
> +class obscache(dualsourcecache):
> +    """cache "does a rev is the precursors of some obsmarkers" property
> +
> +    This is not directly holding the "is this revision obsolete" information,
> +    because phases data gets into play here. However, it allow to compute the
> +    "obsolescence" set without reading the obsstore content.
> +
> +    The cache use a bytearray to store that data and simply write it on disk
> +    for storage.
> +
> +    Implementation note #1:
> +
> +      The obsstore is implementing only half of the transaction logic it
> +      should. It properly record the starting point of the obsstore to allow
> +      clean rollback. However it still write to the obsstore file directly
> +      during the transaction. Instead it should be keeping data in memory and
> +      write to a '.pending' file to make the data vailable for hooks.
> +
> +      This cache is not going further than what the obsstore is doing, so it
> +      does not has any '.pending' logic. When the obsstore gains proper
> +      '.pending' support, adding it to this cache should not be too hard. As
> +      the flag always move from 0 to 1, we could have a second '.pending' cache
> +      file to be read. If flag is set in any of them, the value is 1. For the
> +      same reason, updating the file in place should be possible.
> +
> +    Implementation note #2:
> +
> +        Storage-wise, we could have a "start rev" to avoid storing useless
> +        zero. That would be especially useful for the '.pending' overlay.
> +    """
> +
> +    _filepath = 'cache/obscache-v01'
> +    _headerformat = '>q20sQQ20s'
> +
> +    _cachename = 'obscache' # used for error message
> +
> +    def __init__(self, repo):
> +        super(obscache, self).__init__()
> +        self._ondiskkey = None
> +        self._vfs = repo.vfs
> +        self._data = bytearray()
> +
> +    def get(self, rev):
> +        """return True if "rev" is used as "precursors for any obsmarkers
> +
> +        IMPORTANT: make sure the cache has been updated to match the repository
> +        content before using it"""
> +        return self._data[rev]
> +
> +    def clear(self, reset=False):
> +        """invalidate the in memory cache content"""
> +        super(obscache, self).clear(reset=reset)
> +        self._data = bytearray()
> +
> +    def _updatefrom(self, repo, revs, obsmarkers):
> +        if revs:
> +            self._updaterevs(repo, revs)
> +        if obsmarkers:
> +            self._updatemarkers(repo, obsmarkers)
> +
> +    def _updaterevs(self, repo, revs):
> +        """update the cache with new revisions
> +
> +        Newly added changesets might be affected by obsolescence markers we
> +        already have locally. So we needs to have some global knowledge about
> +        the markers to handle that question.
> +
> +        XXX performance note:
> +
> +        Right now this requires parsing all markers in the obsstore. We could
> +        imagine using various optimisation (eg: another cache, network
> +        exchange, etc).
> +
> +        A possible approach to this is to build a set of all nodes used as
> +        precursors in `obsstore._obscandidate`. If markers are not loaded yet,
> +        we could initialize it by doing a quick scan through the obsstore data
> +        and filling a (pre-sized) set. Doing so would be much faster than
> +        parsing all the obsmarkers since we would access less data, not create
> +        any object beside the nodes and not have to decode any complex data.
> +
> +        For now we stick to the simpler approach of paying the
> +        performance cost on new changesets.
> +        """
> +        node = repo.changelog.node
> +        succs = repo.obsstore.successors
> +        for r in revs:
> +            val = int(node(r) in succs)
> +            self._data.append(val)
> +        cl = repo.changelog
> +        assert len(self._data) == len(cl), (len(self._data), len(cl))
> +
> +    def _updatemarkers(self, repo, obsmarkers):
> +        """update the cache with new markers"""
> +        rev = repo.changelog.nodemap.get
> +        for m in obsmarkers:
> +            r = rev(m[0])
> +            if r is not None:
> +                self._data[r] = 1
> +
> +    def save(self, repo):
> +        """save the data to disk
> +
> +        Format is pretty simple, we serialise the cache key and then drop the
> +        bytearray.
> +        """
> +
> +        # XXX we'll need some pending related logic when the obsstore get it
> +
> +        if self._cachekey is None or self._cachekey == self._ondiskkey:
> +            return
> +
> +        cachefile = repo.vfs(self._filepath, 'w', atomictemp=True)
> +        headerdata = struct.pack(self._headerformat, *self._cachekey)
> +        cachefile.write(headerdata)
> +        cachefile.write(self._data)
> +        cachefile.close()
> +
> +    def load(self, repo):
> +        """load data from disk"""
> +        assert repo.filtername is None
> +
> +        data = repo.vfs.tryread(self._filepath)
> +        if not data:
> +            self._cachekey = self.emptykey
> +            self._data = bytearray()
> +        else:
> +            headersize = struct.calcsize(self._headerformat)
> +            self._cachekey = struct.unpack(self._headerformat,
> +                                           data[:headersize])
> +            self._data = bytearray(data[headersize:])
> +        self._ondiskkey = self._cachekey
> +
>  # mapping of 'set-name' -> <function to compute this set>
>  cachefuncs = {}
>  def cachefor(name):
> _______________________________________________
> Mercurial-devel mailing list
> [hidden email]
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 8 of 8] obscache: use the obscache to compute the obsolete set

Augie Fackler-2
In reply to this post by Pierre-Yves David-2
On Fri, May 19, 2017 at 04:28:07PM +0200, Pierre-Yves David wrote:

> # HG changeset patch
> # User Pierre-Yves David <[hidden email]>
> # Date 1495198021 -7200
> #      Fri May 19 14:47:01 2017 +0200
> # Node ID f1082c7b4a2eb56f9bddfabacc7c5b3fefd19c75
> # Parent  f41a5dbded63f3c721456187b88c49cd90f5ad53
> # EXP-Topic obscache
> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
> #              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r f1082c7b4a2e
> obscache: use the obscache to compute the obsolete set

Very excited about this series, took the first two patches, but I had
some nits on some of the middle patches.

>
> Now that we have a cache and that the cache is kept up to date, we can use it to
> speeds up the obsolete set computation. This way, we no longer need to load the
> obsstore for most operation.
>
> On the mercurial-core repository, this provide a significant speed up:
>
> Running "hg  id -r ."
> - before: 0.630 second (0.56s user 0.06s system 99% cpu 0.630)
> - after:  0.129 second (0.11s user 0.02s system 98% cpu 0.129)
>
> The and obsstore loading operation disapear from execution profile.
>
>
> To keep the changeset simple it the handling of case were
> the cache has not been kept up to date is pretty simple. That might introduce a
> small performance impact during the transition in some case. This will get
> improved in later changeset.
>
> In addition the cache still needs to parse the full obsstore when updating.
> There as known way to skip parsing the full obsstore for wrote operation too.
> This will also get improved later.
>
> diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
> --- a/mercurial/obsolete.py
> +++ b/mercurial/obsolete.py
> @@ -1539,10 +1539,26 @@ def clearobscaches(repo):
>  def _computeobsoleteset(repo):
>      """the set of obsolete revisions"""
>      obs = set()
> -    getnode = repo.changelog.node
>      notpublic = repo._phasecache.getrevset(repo, (phases.draft, phases.secret))
> +    if not notpublic:
> +        # all changeset are public, none are obsolete
> +        return obs
> +
> +    # XXX There are a couple of case where the cache could not be up to date:
> +    #
> +    # 1) no transaction happened in the repository since the upgrade,
> +    # 2) both old and new client touches that repository
> +    #
> +    # recomputing the whole cache in these case is a bit slower that using the
> +    # good old version (parsing markers and checking them). We could add some
> +    # logic to fall back to the old way in these cases.
> +    obscache = repo.obsstore.obscache
> +    obscache.update(repo) # ensure it is up to date:
> +    isobs = obscache.get
> +
> +    # actually compute the obsolete set
>      for r in notpublic:
> -        if getnode(r) in repo.obsstore.successors:
> +        if isobs(r):
>              obs.add(r)
>      return obs
>
> _______________________________________________
> Mercurial-devel mailing list
> [hidden email]
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 4 of 8] obscache: add an abstract base class for changelog+obstore cache

Pierre-Yves David-2
In reply to this post by Augie Fackler-2


On 05/19/2017 11:50 PM, Augie Fackler wrote:

> On Fri, May 19, 2017 at 04:28:03PM +0200, Pierre-Yves David wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <[hidden email]>
>> # Date 1495194829 -7200
>> #      Fri May 19 13:53:49 2017 +0200
>> # Node ID b5c984cd0640255b94f22d56a4bfe398b2c5de2e
>> # Parent  c67d3438f0d1ee56437ab0cffd49b02f441f876e
>> # EXP-Topic obscache
>> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
>> #              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r b5c984cd0640
>> obscache: add an abstract base class for changelog+obstore cache
>>
>> There are many useful properties to cache around obsolescence. Some of them
>> needs data from both the changelog and the obsstore. We needs logic handle to
>> track changes and strip to both data structure are the same time. We also wants
>> these to allow incremental update when possible instead of recomputing the whole
>> set all the time.
>>
>> As this is a common needs, we start by adding an abstract class reusable by
>> multiple cache.
>>
>> This code was initially part of the evolve extensions and matured there.
>>
>> diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
>> --- a/mercurial/obsolete.py
>> +++ b/mercurial/obsolete.py
>> @@ -1141,6 +1141,228 @@ def successorssets(repo, initialnode, ca
>>                  cache[current] = final
>>      return cache[initialnode]
>>
>> +class dualsourcecache(object):
>
> I'm not in love with this name, as it sounds more generic than it
> is. I might also have some nits about it being an abstract class
> depending on the future patches.

I'm not in love with that name too. It is the best reasonable thing that
I could come with.

Alternative name proposal welcome.

>> +    """An abstract class for cache that needs both changelog and obsstore
>> +
>> +    This class handle the tracking of changelog and obsstore update. It provide
>> +    data to performs incremental update (see the 'updatefrom' function for
>> +    details).  This class can also detect stripping of the changelog or the
>> +    obsstore and can reset the cache in this cache (see the 'clear' function
>> +    for details).
>> +
>> +    For this purpose it use a cache key covering changelog and obsstore
>> +    content:
>> +
>> +    The cache key parts are:
>> +    - tip-rev,
>> +    - tip-node,
>> +    - obsstore-length (nb markers),
>> +    - obsstore-file-size (in bytes),
>> +    - obsstore "cache key"
>> +    """
>> +
>> +    # default key used for an empty cache
>> +    emptykey = (node.nullrev, node.nullid, 0, 0, node.nullid)
>> +    _cachename = None # used for debug message
>> +
>> +    def __init__(self):
>> +        super(dualsourcecache, self).__init__()
>> +        self._cachekey = None
>> +
>> +    @property
>> +    def _tiprev(self):
>> +        """small utility to make the code less error prone"""
>> +        if self._cachekey is None:
>> +            return None
>> +        return self._cachekey[0]
>> +
>> +    @property
>> +    def _obssize(self):
>> +        """small utility to make the code less error prone"""
>> +        if self._cachekey is None:
>> +            return None
>> +        return self._cachekey[3]
>> +
>> +    def _updatefrom(self, repo, revs, obsmarkers):
>> +        """override this method to update your cache data incrementally
>> +
>> +        revs:      list of new revision in the changelog
>> +        obsmarker: list of new obsmarkers in the obsstore
>> +        """
>> +        raise NotImplementedError
>
> Nit: could we move all the abstract parts of the interface to the top
> of the class so it's easier to find what has to be implemented to get
> a concrete implementation?

It actually was… until I added the two property helper without thinking
about it (tiprev, obssize). I'll fix that in V2. (you can ignore this
two in the meantime and assume the rest is ordered as you wished).

Cheers,

--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 5 of 8] obscache: add a cache for 1/2 of the "obsolete" property

Pierre-Yves David-2
In reply to this post by Augie Fackler-2


On 05/19/2017 11:53 PM, Augie Fackler wrote:

> On Fri, May 19, 2017 at 04:28:04PM +0200, Pierre-Yves David wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <[hidden email]>
>> # Date 1495197986 -7200
>> #      Fri May 19 14:46:26 2017 +0200
>> # Node ID c2b1c81b3c4cba8cc21b1a1b92f4e34e6aa1807a
>> # Parent  b5c984cd0640255b94f22d56a4bfe398b2c5de2e
>> # EXP-Topic obscache
>> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
>> #              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r c2b1c81b3c4c
>> obscache: add a cache for 1/2 of the "obsolete" property
>
> So, can you point me to other (upcoming) concrete subclasses of
> dualsourcecache, or is this the only one?

There are current a second user of this abstract class in evolve. It is
used for obsdiscovery[1]. And I expect to we'll experiment with more of
this class of cache in the near future.

[1]
https://www.mercurial-scm.org/repo/evolve/file/71d242e0fc1a/hgext3rd/evolve/obsdiscovery.py#l440

The logic in the cache is easy to screw up and the end result is really
need to use (implement save/load/reset/update(revs, obs)). Having a
class ready to use is -really- handy.

In addition, there are final stage of skip-obsstore optimization that
requires some rework and collaboration in core something hard to access
from an extension.

So, having this intermediate abstract class is a corner-stone of the
current contribution. (most of work in evolve 6.2 is actually about
extracting and reusing this logic)

> Also, could we put these big new classes in obscache.py or some other
> new file? I'm wary of adding more code to the already-1200-lines
> obsolete.py, and since these are coming from evolve I'm hoping that
> they're still easily separable from obsolete.py...

This crossed my mind when adding it. However, there are significant
improvement to that class arriving soon. These improvement will need to
access (and refactor) some lower logic around reading markers and al. I
think it is wiser to wait for the dust to settle after these upgrade and
cut where this is

(the version in evolve access some "_private" method and duplicates some
code)

Cheers,

--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 5 of 8] obscache: add a cache for 1/2 of the "obsolete" property

Augie Fackler-2

> On May 19, 2017, at 15:15, Pierre-Yves David <[hidden email]> wrote:
>
>
>
> On 05/19/2017 11:53 PM, Augie Fackler wrote:
>> On Fri, May 19, 2017 at 04:28:04PM +0200, Pierre-Yves David wrote:
>>> # HG changeset patch
>>> # User Pierre-Yves David <[hidden email]>
>>> # Date 1495197986 -7200
>>> #      Fri May 19 14:46:26 2017 +0200
>>> # Node ID c2b1c81b3c4cba8cc21b1a1b92f4e34e6aa1807a
>>> # Parent  b5c984cd0640255b94f22d56a4bfe398b2c5de2e
>>> # EXP-Topic obscache
>>> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
>>> #              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r c2b1c81b3c4c
>>> obscache: add a cache for 1/2 of the "obsolete" property
>>
>> So, can you point me to other (upcoming) concrete subclasses of
>> dualsourcecache, or is this the only one?
>
> There are current a second user of this abstract class in evolve. It is used for obsdiscovery[1]. And I expect to we'll experiment with more of this class of cache in the near future.
>
> [1] https://www.mercurial-scm.org/repo/evolve/file/71d242e0fc1a/hgext3rd/evolve/obsdiscovery.py#l440
>
> The logic in the cache is easy to screw up and the end result is really need to use (implement save/load/reset/update(revs, obs)). Having a class ready to use is -really- handy.

I wish it could be done without the use of inheritance, but it's fine-ish for now as an abstract class. Should it perhaps be using the abc module to be a more explicit base class?

(I haven't ever used abc, so "no" or "not now" is fine)

> In addition, there are final stage of skip-obsstore optimization that requires some rework and collaboration in core something hard to access from an extension.
>
> So, having this intermediate abstract class is a corner-stone of the current contribution. (most of work in evolve 6.2 is actually about extracting and reusing this logic)
>
>> Also, could we put these big new classes in obscache.py or some other
>> new file? I'm wary of adding more code to the already-1200-lines
>> obsolete.py, and since these are coming from evolve I'm hoping that
>> they're still easily separable from obsolete.py...
>
> This crossed my mind when adding it. However, there are significant improvement to that class arriving soon. These improvement will need to access (and refactor) some lower logic around reading markers and al. I think it is wiser to wait for the dust to settle after these upgrade and cut where this is

I'm not really satisfied with this answer. Can you split things up into a better organization scheme so we don't end up with more of a mess in this already far-too-large module? That'd make things easier for me to reason about.

> (the version in evolve access some "_private" method and duplicates some code)

^ this is a sign to me that some more refactoring should probably happen before this lands, as much as I want it now.

>
> Cheers,
>
> --
> Pierre-Yves David

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 4 of 8] obscache: add an abstract base class for changelog+obstore cache

Augie Fackler-2
In reply to this post by Pierre-Yves David-2

On May 19, 2017, at 15:15, Pierre-Yves David <[hidden email]> wrote:



On 05/19/2017 11:50 PM, Augie Fackler wrote:
On Fri, May 19, 2017 at 04:28:03PM +0200, Pierre-Yves David wrote:
# HG changeset patch
# User Pierre-Yves David <[hidden email]>
# Date 1495194829 -7200
#      Fri May 19 13:53:49 2017 +0200
# Node ID b5c984cd0640255b94f22d56a4bfe398b2c5de2e
# Parent  c67d3438f0d1ee56437ab0cffd49b02f441f876e
# EXP-Topic obscache
# Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
#              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r b5c984cd0640
obscache: add an abstract base class for changelog+obstore cache

There are many useful properties to cache around obsolescence. Some of them
needs data from both the changelog and the obsstore. We needs logic handle to
track changes and strip to both data structure are the same time. We also wants
these to allow incremental update when possible instead of recomputing the whole
set all the time.

As this is a common needs, we start by adding an abstract class reusable by
multiple cache.

This code was initially part of the evolve extensions and matured there.

diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -1141,6 +1141,228 @@ def successorssets(repo, initialnode, ca
                cache[current] = final
    return cache[initialnode]

+class dualsourcecache(object):

I'm not in love with this name, as it sounds more generic than it
is. I might also have some nits about it being an abstract class
depending on the future patches.

I'm not in love with that name too. It is the best reasonable thing that I could come with.

Alternative name proposal welcome.

It's explicitly only useful for caches that are based on both changelog and obsstore, right? Not any two sources? Why not changelogandobscachebase?

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 4 of 8] obscache: add an abstract base class for changelog+obstore cache

Pierre-Yves David-2


On 05/20/2017 12:33 AM, Augie Fackler wrote:

>
>> On May 19, 2017, at 15:15, Pierre-Yves David
>> <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>
>>
>> On 05/19/2017 11:50 PM, Augie Fackler wrote:
>>> On Fri, May 19, 2017 at 04:28:03PM +0200, Pierre-Yves David wrote:
>>>> # HG changeset patch
>>>> # User Pierre-Yves David <[hidden email]
>>>> <mailto:[hidden email]>>
>>>> # Date 1495194829 -7200
>>>> #      Fri May 19 13:53:49 2017 +0200
>>>> # Node ID b5c984cd0640255b94f22d56a4bfe398b2c5de2e
>>>> # Parent  c67d3438f0d1ee56437ab0cffd49b02f441f876e
>>>> # EXP-Topic obscache
>>>> # Available At
>>>> https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
>>>> #              hg pull
>>>> https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r
>>>> b5c984cd0640
>>>> obscache: add an abstract base class for changelog+obstore cache
>>>>
>>>> There are many useful properties to cache around obsolescence. Some
>>>> of them
>>>> needs data from both the changelog and the obsstore. We needs logic
>>>> handle to
>>>> track changes and strip to both data structure are the same time. We
>>>> also wants
>>>> these to allow incremental update when possible instead of
>>>> recomputing the whole
>>>> set all the time.
>>>>
>>>> As this is a common needs, we start by adding an abstract class
>>>> reusable by
>>>> multiple cache.
>>>>
>>>> This code was initially part of the evolve extensions and matured there.
>>>>
>>>> diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
>>>> --- a/mercurial/obsolete.py
>>>> +++ b/mercurial/obsolete.py
>>>> @@ -1141,6 +1141,228 @@ def successorssets(repo, initialnode, ca
>>>>                 cache[current] = final
>>>>     return cache[initialnode]
>>>>
>>>> +class dualsourcecache(object):
>>>
>>> I'm not in love with this name, as it sounds more generic than it
>>> is. I might also have some nits about it being an abstract class
>>> depending on the future patches.
>>
>> I'm not in love with that name too. It is the best reasonable thing
>> that I could come with.
>>
>> Alternative name proposal welcome.
>
> It's explicitly only useful for caches that are based on both changelog
> and obsstore, right? Not any two sources? Why not changelogandobscachebase?

24 char name does not pass my "reasonable" filter.

It is not just any two class, but it leave in the obsolete.py file. that
gives a hint.

I though of "clobscachebase" but this is a bit cryptic, (obsclcachebase
is not much better)

Cheers,

--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 1 of 8] cache: make the cache updated callback easily accessible to extension

Yuya Nishihara
In reply to this post by Pierre-Yves David-2
On Fri, 19 May 2017 16:28:00 +0200, Pierre-Yves David wrote:

> # HG changeset patch
> # User Pierre-Yves David <[hidden email]>
> # Date 1495192163 -7200
> #      Fri May 19 13:09:23 2017 +0200
> # Node ID 1fe18fcc122b4d8e14264a3670d2d5f99f11bd75
> # Parent  8a87bfc5bebbbe0ac996ac8e047a029eb931af45
> # EXP-Topic obscache
> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
> #              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r 1fe18fcc122b
> cache: make the cache updated callback easily accessible to extension
>
> This will help extension to benefit from this new logic. As a side effect this
> clarify the 'transaction' method a little bit.
>
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
> @@ -1091,10 +1091,7 @@ class localrepository(object):
>                                 **pycompat.strkwargs(hookargs))
>              reporef()._afterlock(hook)
>          tr.addfinalize('txnclose-hook', txnclosehook)
> -        def warmscache(tr2):
> -            repo = reporef()
> -            repo.updatecaches(tr2)
> -        tr.addpostclose('warms-cache', warmscache)
> +        tr.addpostclose('warms-cache', self._buildcacheupdater(tr))
>          def txnaborthook(tr2):
>              """To be run if transaction is aborted
>              """
> @@ -1229,6 +1226,20 @@ class localrepository(object):
>          self.destroyed()
>          return 0
>  
> +    def _buildcacheupdater(self, newtransaction):
> +        """called during transaction to build the callback updating cache
> +
> +        Lives on the repository to help extension who might want to augment
> +        this logic. For this purpose, the created transaction is passed to the
> +        method.
> +        """
> +        # we must avoid cyclic reference between repo and transaction.
> +        reporef = weakref.ref(self)
> +        def updater(tr):
> +            repo = reporef()
> +            repo.updatecaches(tr)
> +        return updater

Out of curiosity, why do we need this whole weakref business? IIUC, repo holds
no strong reference to the transaction object.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 1 of 8] cache: make the cache updated callback easily accessible to extension

Pierre-Yves David-2


On 05/20/2017 06:49 AM, Yuya Nishihara wrote:

> On Fri, 19 May 2017 16:28:00 +0200, Pierre-Yves David wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <[hidden email]>
>> # Date 1495192163 -7200
>> #      Fri May 19 13:09:23 2017 +0200
>> # Node ID 1fe18fcc122b4d8e14264a3670d2d5f99f11bd75
>> # Parent  8a87bfc5bebbbe0ac996ac8e047a029eb931af45
>> # EXP-Topic obscache
>> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
>> #              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r 1fe18fcc122b
>> cache: make the cache updated callback easily accessible to extension
>>
>> This will help extension to benefit from this new logic. As a side effect this
>> clarify the 'transaction' method a little bit.
>>
>> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
>> --- a/mercurial/localrepo.py
>> +++ b/mercurial/localrepo.py
>> @@ -1091,10 +1091,7 @@ class localrepository(object):
>>                                 **pycompat.strkwargs(hookargs))
>>              reporef()._afterlock(hook)
>>          tr.addfinalize('txnclose-hook', txnclosehook)
>> -        def warmscache(tr2):
>> -            repo = reporef()
>> -            repo.updatecaches(tr2)
>> -        tr.addpostclose('warms-cache', warmscache)
>> +        tr.addpostclose('warms-cache', self._buildcacheupdater(tr))
>>          def txnaborthook(tr2):
>>              """To be run if transaction is aborted
>>              """
>> @@ -1229,6 +1226,20 @@ class localrepository(object):
>>          self.destroyed()
>>          return 0
>>
>> +    def _buildcacheupdater(self, newtransaction):
>> +        """called during transaction to build the callback updating cache
>> +
>> +        Lives on the repository to help extension who might want to augment
>> +        this logic. For this purpose, the created transaction is passed to the
>> +        method.
>> +        """
>> +        # we must avoid cyclic reference between repo and transaction.
>> +        reporef = weakref.ref(self)
>> +        def updater(tr):
>> +            repo = reporef()
>> +            repo.updatecaches(tr)
>> +        return updater
>
> Out of curiosity, why do we need this whole weakref business? IIUC, repo holds
> no strong reference to the transaction object.

I've preserved the usage of the reporef from the 'transaction' method.

Tracking the introduction of the 'reporef' there trace back to
db8679812f84 and usage of repository in "afterlock" call.

So I'm not sure it is useful in this context but it probably do not hurt.

--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 5 of 8] obscache: add a cache for 1/2 of the "obsolete" property

Pierre-Yves David-2
In reply to this post by Augie Fackler-2
On 05/20/2017 12:32 AM, Augie Fackler wrote:

>> On May 19, 2017, at 15:15, Pierre-Yves David <[hidden email]> wrote:
>>
>> On 05/19/2017 11:53 PM, Augie Fackler wrote:
>>> On Fri, May 19, 2017 at 04:28:04PM +0200, Pierre-Yves David wrote:
>>>> # HG changeset patch
>>>> # User Pierre-Yves David <[hidden email]>
>>>> # Date 1495197986 -7200
>>>> #      Fri May 19 14:46:26 2017 +0200
>>>> # Node ID c2b1c81b3c4cba8cc21b1a1b92f4e34e6aa1807a
>>>> # Parent  b5c984cd0640255b94f22d56a4bfe398b2c5de2e
>>>> # EXP-Topic obscache
>>>> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
>>>> #              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r c2b1c81b3c4c
>>>> obscache: add a cache for 1/2 of the "obsolete" property
>>>
>>> So, can you point me to other (upcoming) concrete subclasses of
>>> dualsourcecache, or is this the only one?
>>
>> There are current a second user of this abstract class in evolve. It is used for obsdiscovery[1]. And I expect to we'll experiment with more of this class of cache in the near future.
>>
>> [1] https://www.mercurial-scm.org/repo/evolve/file/71d242e0fc1a/hgext3rd/evolve/obsdiscovery.py#l440
>>
>> The logic in the cache is easy to screw up and the end result is really need to use (implement save/load/reset/update(revs, obs)). Having a class ready to use is -really- handy.
>
> I wish it could be done without the use of inheritance, but it's fine-ish for now as an abstract class. Should it perhaps be using the abc module to be a more explicit base class?
>
> (I haven't ever used abc, so "no" or "not now" is fine)

That looks like a good idea, sold!

>> In addition, there are final stage of skip-obsstore optimization that requires some rework and collaboration in core something hard to access from an extension.
>>
>> So, having this intermediate abstract class is a corner-stone of the current contribution. (most of work in evolve 6.2 is actually about extracting and reusing this logic)
>>
>>> Also, could we put these big new classes in obscache.py or some other
>>> new file? I'm wary of adding more code to the already-1200-lines
>>> obsolete.py, and since these are coming from evolve I'm hoping that
>>> they're still easily separable from obsolete.py...
>>
>> This crossed my mind when adding it. However, there are significant improvement to that class arriving soon. These improvement will need to access (and refactor) some lower logic around reading markers and al. I think it is wiser to wait for the dust to settle after these upgrade and cut where this is
>
> I'm not really satisfied with this answer. Can you split things up into a better organization scheme so we don't end up with more of a mess in this already far-too-large module? That'd make things easier for me to reason about.

Since, I did not used the "not now" voucher for the question above. I
would like to it here :-). That module can be split up, but not now.

This series is about improving caching. It lands new cache related code
right next to other cache related code. The module grow bigger but not
messier.

Splitting this module (likely by creating a obsutil.py module for
utility functions) will require proper work (redirecting internal user,
adding deprecation for external user of the util, etc). That is a
significant unrelated scope bloat that is unlikely to affect the review
of this series.

There are nothing urgent in splitting this module, it is big, but not
massive by our current standard (there are about25 modules larger). I've
already split and cleaned many modules so you can trust me to keep a
good balance here.

I'll send a V2 to make the discussion move forward on the code.

>> (the version in evolve access some "_private" method and duplicates some code)
>
> ^ this is a sign to me that some more refactoring should probably happen before this lands, as much as I want it now.

I just see it as a sign there are advance experiments going on in the
evolve extension.

Cheers,

--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 1 of 8] cache: make the cache updated callback easily accessible to extension

Gregory Szorc
In reply to this post by Pierre-Yves David-2
On Sat, May 20, 2017 at 7:21 AM, Pierre-Yves David <[hidden email]> wrote:


On 05/20/2017 06:49 AM, Yuya Nishihara wrote:
On Fri, 19 May 2017 16:28:00 +0200, Pierre-Yves David wrote:
# HG changeset patch
# User Pierre-Yves David <[hidden email]>
# Date 1495192163 -7200
#      Fri May 19 13:09:23 2017 +0200
# Node ID 1fe18fcc122b4d8e14264a3670d2d5f99f11bd75
# Parent  8a87bfc5bebbbe0ac996ac8e047a029eb931af45
# EXP-Topic obscache
# Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
#              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r 1fe18fcc122b
cache: make the cache updated callback easily accessible to extension

This will help extension to benefit from this new logic. As a side effect this
clarify the 'transaction' method a little bit.

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -1091,10 +1091,7 @@ class localrepository(object):
                                **pycompat.strkwargs(hookargs))
             reporef()._afterlock(hook)
         tr.addfinalize('txnclose-hook', txnclosehook)
-        def warmscache(tr2):
-            repo = reporef()
-            repo.updatecaches(tr2)
-        tr.addpostclose('warms-cache', warmscache)
+        tr.addpostclose('warms-cache', self._buildcacheupdater(tr))
         def txnaborthook(tr2):
             """To be run if transaction is aborted
             """
@@ -1229,6 +1226,20 @@ class localrepository(object):
         self.destroyed()
         return 0

+    def _buildcacheupdater(self, newtransaction):
+        """called during transaction to build the callback updating cache
+
+        Lives on the repository to help extension who might want to augment
+        this logic. For this purpose, the created transaction is passed to the
+        method.
+        """
+        # we must avoid cyclic reference between repo and transaction.
+        reporef = weakref.ref(self)
+        def updater(tr):
+            repo = reporef()
+            repo.updatecaches(tr)
+        return updater

Out of curiosity, why do we need this whole weakref business? IIUC, repo holds
no strong reference to the transaction object.

I've preserved the usage of the reporef from the 'transaction' method.

Tracking the introduction of the 'reporef' there trace back to db8679812f84 and usage of repository in "afterlock" call.

So I'm not sure it is useful in this context but it probably do not hurt.

I believe I inserted some of these. There are circular references involving repo instances. The ref arc is inserted through transactions, specifically these transaction callbacks, which either store a ref to the repo explicitly (as in this patch) or inherit one via the outer scope.

If a single repo undergoes multiple transactions, we effectively leak transactions, repoviews, and associated objects because of the circular refs. I believe we still have leaks today because I have a handful of old patches sitting around attempting to break some cycles by inserting more weakrefs and proxies. If you want to reproduce, run `hg convert` on a large repo.

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
12
Loading...