D4850: store: pass matcher to store.datafiles() and filter files according to it

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

D4850: store: pass matcher to store.datafiles() and filter files according to it

martinvonz (Martin von Zweigbergk)
pulkit created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  To get narrow stream clones working, we need a way to filter the storage files
  using a matcher. This patch adds matcher as an argument to store.walk() and
  store.datafiles() so that we can filter the files returned according to the
  matcher.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D4850

AFFECTED FILES
  mercurial/store.py

CHANGE DETAILS

diff --git a/mercurial/store.py b/mercurial/store.py
--- a/mercurial/store.py
+++ b/mercurial/store.py
@@ -374,17 +374,21 @@
         l.sort()
         return l
 
-    def datafiles(self):
+    def datafiles(self, matcher=None):
         return self._walk('data', True) + self._walk('meta', True)
 
     def topfiles(self):
         # yield manifest before changelog
         return reversed(self._walk('', False))
 
-    def walk(self):
-        '''yields (unencoded, encoded, size)'''
+    def walk(self, matcher=None):
+        '''yields (unencoded, encoded, size)
+
+        if a matcher is passed, storage files of only those tracked paths
+        are passed with matches the matcher
+        '''
         # yield data files first
-        for x in self.datafiles():
+        for x in self.datafiles(matcher):
             yield x
         for x in self.topfiles():
             yield x
@@ -422,12 +426,14 @@
         self.vfs = vfsmod.filtervfs(vfs, encodefilename)
         self.opener = self.vfs
 
-    def datafiles(self):
+    def datafiles(self, matcher=None):
         for a, b, size in super(encodedstore, self).datafiles():
             try:
                 a = decodefilename(a)
             except KeyError:
                 a = None
+            if matcher and not matcher(_gettrackedpath(a)):
+                continue
             yield a, b, size
 
     def join(self, f):
@@ -551,8 +557,10 @@
     def getsize(self, path):
         return self.rawvfs.stat(path).st_size
 
-    def datafiles(self):
+    def datafiles(self, matcher=None):
         for f in sorted(self.fncache):
+            if matcher and not matcher(_gettrackedpath(f)):
+                continue
             ef = self.encode(f)
             try:
                 yield f, ef, self.getsize(ef)



To: pulkit, #hg-reviewers
Cc: mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D4850: store: pass matcher to store.datafiles() and filter files according to it

martinvonz (Martin von Zweigbergk)
martinvonz added inline comments.

INLINE COMMENTS

> store.py:429-437
> +    def datafiles(self, matcher=None):
>          for a, b, size in super(encodedstore, self).datafiles():
>              try:
>                  a = decodefilename(a)
>              except KeyError:
>                  a = None
> +            if matcher and not matcher(_gettrackedpath(a)):

This doesn't seem right to me. Let's say the `matcher` is `rootfilesin:some/dir`, then `matcher('some/dir/foo')` will be True, but `matcher('some')` (the first-level directory) will not be. That seems to mean that the client will not get all the directories it needs.

Maybe this code needs to be made less generic and start walking the directories like other tree-walking algorithms we have do. In repos that use tree manifests for all their commits, we should be able to walk the directories in `.hg/store/meta` and look for files in `.hg/store/data` only for directories found in that walk. However, that only works if all commits use treemanifests. I think it's good enough for now (and maybe forever) to instead pass all the file names into a `util.dirs` object and then walk those directories using `matcher.visitdir()`. For each directory found that way, we would look for the manifest revlog in `.hg/store/meta` and include it if it's found (and ignore it if it's not).

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D4850

To: pulkit, #hg-reviewers
Cc: martinvonz, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D4850: store: pass matcher to store.datafiles() and filter files according to it

martinvonz (Martin von Zweigbergk)
In reply to this post by martinvonz (Martin von Zweigbergk)
indygreg added a comment.


  Another conceptual problem with this is that it assumes `data/` and `meta/` are used for tracking just filelogs and manifestlogs. In theory, other revlogs / data files could be stored there.
 
  For files / `data/` paths, I think we're OK making this assumption. But for manifests / `meta/`, I would feel better if we built up a set of tree manifest directories and then intersected that with files in `meta/` that map to their revlogs.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D4850

To: pulkit, #hg-reviewers, martinvonz
Cc: indygreg, martinvonz, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D4850: store: pass matcher to store.datafiles() and filter files according to it

martinvonz (Martin von Zweigbergk)
In reply to this post by martinvonz (Martin von Zweigbergk)
pulkit added a comment.


  In https://phab.mercurial-scm.org/D4850#75708, @indygreg wrote:
 
  > Another conceptual problem with this is that it assumes `data/` and `meta/` are used for tracking just filelogs and manifestlogs. In theory, other revlogs / data files could be stored there.
  >
  > For files / `data/` paths, I think we're OK making this assumption. But for manifests / `meta/`, I would feel better if we built up a set of tree manifest directories and then intersected that with files in `meta/` that map to their revlogs.
 
 
  I discussed this with martinvonz on Friday and we decided to use matcher.visitdir() for the meta/ ones.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D4850

To: pulkit, #hg-reviewers, martinvonz
Cc: indygreg, martinvonz, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D4850: store: pass matcher to store.datafiles()

martinvonz (Martin von Zweigbergk)
In reply to this post by martinvonz (Martin von Zweigbergk)
pulkit updated this revision to Diff 12210.
pulkit retitled this revision from "store: pass matcher to store.datafiles() and filter files according to it" to "store: pass matcher to store.datafiles()".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D4850?vs=11602&id=12210

REVISION DETAIL
  https://phab.mercurial-scm.org/D4850

AFFECTED FILES
  mercurial/store.py

CHANGE DETAILS

diff --git a/mercurial/store.py b/mercurial/store.py
--- a/mercurial/store.py
+++ b/mercurial/store.py
@@ -359,17 +359,21 @@
         l.sort()
         return l
 
-    def datafiles(self):
+    def datafiles(self, matcher=None):
         return self._walk('data', True) + self._walk('meta', True)
 
     def topfiles(self):
         # yield manifest before changelog
         return reversed(self._walk('', False))
 
-    def walk(self):
-        '''yields (unencoded, encoded, size)'''
+    def walk(self, matcher=None):
+        '''yields (unencoded, encoded, size)
+
+        if a matcher is passed, storage files of only those tracked paths
+        are passed with matches the matcher
+        '''
         # yield data files first
-        for x in self.datafiles():
+        for x in self.datafiles(matcher):
             yield x
         for x in self.topfiles():
             yield x
@@ -407,7 +411,7 @@
         self.vfs = vfsmod.filtervfs(vfs, encodefilename)
         self.opener = self.vfs
 
-    def datafiles(self):
+    def datafiles(self, matcher=None):
         for a, b, size in super(encodedstore, self).datafiles():
             try:
                 a = decodefilename(a)
@@ -536,7 +540,7 @@
     def getsize(self, path):
         return self.rawvfs.stat(path).st_size
 
-    def datafiles(self):
+    def datafiles(self, matcher=None):
         for f in sorted(self.fncache):
             ef = self.encode(f)
             try:



To: pulkit, #hg-reviewers, martinvonz
Cc: indygreg, martinvonz, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D4850: store: pass matcher to store.datafiles()

martinvonz (Martin von Zweigbergk)
In reply to this post by martinvonz (Martin von Zweigbergk)
This revision was automatically updated to reflect the committed changes.
Closed by commit rHG2d45b549392f: store: pass matcher to store.datafiles() (authored by pulkit, committed by ).

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D4850?vs=12210&id=12229

REVISION DETAIL
  https://phab.mercurial-scm.org/D4850

AFFECTED FILES
  mercurial/store.py

CHANGE DETAILS

diff --git a/mercurial/store.py b/mercurial/store.py
--- a/mercurial/store.py
+++ b/mercurial/store.py
@@ -359,17 +359,21 @@
         l.sort()
         return l
 
-    def datafiles(self):
+    def datafiles(self, matcher=None):
         return self._walk('data', True) + self._walk('meta', True)
 
     def topfiles(self):
         # yield manifest before changelog
         return reversed(self._walk('', False))
 
-    def walk(self):
-        '''yields (unencoded, encoded, size)'''
+    def walk(self, matcher=None):
+        '''yields (unencoded, encoded, size)
+
+        if a matcher is passed, storage files of only those tracked paths
+        are passed with matches the matcher
+        '''
         # yield data files first
-        for x in self.datafiles():
+        for x in self.datafiles(matcher):
             yield x
         for x in self.topfiles():
             yield x
@@ -407,7 +411,7 @@
         self.vfs = vfsmod.filtervfs(vfs, encodefilename)
         self.opener = self.vfs
 
-    def datafiles(self):
+    def datafiles(self, matcher=None):
         for a, b, size in super(encodedstore, self).datafiles():
             try:
                 a = decodefilename(a)
@@ -536,7 +540,7 @@
     def getsize(self, path):
         return self.rawvfs.stat(path).st_size
 
-    def datafiles(self):
+    def datafiles(self, matcher=None):
         for f in sorted(self.fncache):
             ef = self.encode(f)
             try:



To: pulkit, #hg-reviewers, martinvonz
Cc: indygreg, martinvonz, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel