[PATCH 1 of 5 RFC] util: add platform-agnostic absjoin function and some long-paths win utils

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 1 of 5 RFC] util: add platform-agnostic absjoin function and some long-paths win utils

Kostia Balytskyi-2
# HG changeset patch
# User Kostia Balytskyi <[hidden email]>
# Date 1502481241 25200
#      Fri Aug 11 12:54:01 2017 -0700
# Node ID 586870aa06799e2ef49a2be4c8feab1464ad54b1
# Parent  609606d217659e0a6c1cf6f907b6512be5340e57
util: add platform-agnostic absjoin function and some long-paths win utils

The goal of this seies is to add support for long paths on Windows.
Context here is that Win32 API CreateFileA (which opens a file handle and
which is what CreateFile macro is resolved to by default) does not recognize
paths longer than MAX_PATH constant (260 chars), unless the path starts
with \\?\, which causes API to not do any checks or preprocessing of the
file path. Therefore, it is posible to use long paths with this prefix,
but it comes at a cost: forward slashes are not automatically replaced by
baclslashes, . and .. are not automatically resolved to appropriate dirs.

Also, it is hard to change paths to use //?/ everywhere, so I propose that
the approach is to handle both types of paths simultaneously.

Also, even if this series is not accepted as it is, I would like to start
a discussion about solving this problem in general.

This particular patch adds some first steps in reaching this goal:
- it adds helper functions hasntprefix and converttontpath
- it adds absjoin function which is supposed to replace _join method
of dirstate

diff --git a/mercurial/posix.py b/mercurial/posix.py
--- a/mercurial/posix.py
+++ b/mercurial/posix.py
@@ -666,3 +666,9 @@ def bindunixsocket(sock, path):
     if bakwdfd:
         os.fchdir(bakwdfd)
         os.close(bakwdfd)
+
+def absjoin(absprefix, relpath):
+    """Join a relative path to an absolute path. This function assumes that
+    the passed absolute path already contains a terminating directory
+    separator."""
+    return absprefix + relpath
diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -149,6 +149,7 @@ testpid = platform.testpid
 umask = platform.umask
 unlink = platform.unlink
 username = platform.username
+absjoin = platform.absjoin
 
 try:
     recvfds = osutil.recvfds
diff --git a/mercurial/windows.py b/mercurial/windows.py
--- a/mercurial/windows.py
+++ b/mercurial/windows.py
@@ -236,6 +236,28 @@ def normpath(path):
 def normcase(path):
     return encoding.upper(path) # NTFS compares via upper()
 
+def converttontpath(path):
+    """Prepend \\?\ prefix to a path and perform slash replacements
+    The '\\?\' prefix tells Win32 API to not perform userland path checks,
+    therefore allowing long paths. This also means that / are not replaced
+    with \ by Win32, so we need to do it manually. See MSDN entry on
+    CreateFile function for more details"""
+    return '\\\\?\\' + os.path.normpath(path)
+
+def hasntprefix(path):
+    """Check if path starts with a \\?\ prefix"""
+    return path.startswith('\\\\?\\')
+
+def absjoin(absprefix, relpath):
+    """See docblock for posix.absjoin for explanation of what this does."""
+    if hasntprefix(absprefix):
+        # we assume that if absprefix starts with \\?\, this means that it
+        # was already preprocessed by converttontpath and therefore does
+        # not need to passed to `localpath`
+        return absprefix + localpath(relpath)
+    else:
+        return converttontpath(absprefix + relpath)
+
 # see posix.py for definitions
 normcasespec = encoding.normcasespecs.upper
 normcasefallback = encoding.upperfallback
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 2 of 5 RFC] windows: teach dirstate and posixfile to understand long paths

Kostia Balytskyi-2
# HG changeset patch
# User Kostia Balytskyi <[hidden email]>
# Date 1502522414 25200
#      Sat Aug 12 00:20:14 2017 -0700
# Node ID a1c8f310a66ada40697c824f4621bb54c757e9fa
# Parent  586870aa06799e2ef49a2be4c8feab1464ad54b1
windows: teach dirstate and posixfile to understand long paths

diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -269,9 +269,7 @@ class dirstate(object):
         return not util.fscasesensitive(self._join('.hg'))
 
     def _join(self, f):
-        # much faster than os.path.join()
-        # it's safe because f is always a relative path
-        return self._rootdir + f
+        return util.absjoin(self._rootdir, f)
 
     def flagfunc(self, buildfallback):
         if self._checklink and self._checkexec:
diff --git a/mercurial/windows.py b/mercurial/windows.py
--- a/mercurial/windows.py
+++ b/mercurial/windows.py
@@ -123,6 +123,9 @@ class mixedfilemodewrapper(object):
 def posixfile(name, mode='r', buffering=-1):
     '''Open a file with even more POSIX-like semantics'''
     try:
+        if os.path.isabs(name) and not hasntprefix(name):
+            # we're out of luck if path is relative
+            name = converttontpath(name)
         fp = osutil.posixfile(name, mode, buffering) # may raise WindowsError
 
         # The position when opening in append mode is implementation defined, so
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3 of 5 RFC] dirstate: make dirstate._join a bit faster on posix

Kostia Balytskyi-2
In reply to this post by Kostia Balytskyi-2
# HG changeset patch
# User Kostia Balytskyi <[hidden email]>
# Date 1502522493 25200
#      Sat Aug 12 00:21:33 2017 -0700
# Node ID bac7b1ae4a0c998c57cea5eac51af66853779918
# Parent  a1c8f310a66ada40697c824f4621bb54c757e9fa
dirstate: make dirstate._join a bit faster on posix

This is a micro-optimization, suggested by @quark for one of the previous
patches. It is safe to land the series without this micro-optimization.

diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -99,6 +99,9 @@ class dirstate(object):
         # for consistent view between _pl() and _read() invocations
         self._pendingmode = None
 
+        if pycompat.osname != 'nt':
+            self._join = self._join_posix
+
     @contextlib.contextmanager
     def parentchange(self):
         '''Context manager for handling dirstate parents.
@@ -271,6 +274,13 @@ class dirstate(object):
     def _join(self, f):
         return util.absjoin(self._rootdir, f)
 
+    def _join_posix(self, f):
+        # The cross-platform _join() has windows-specific logic that
+        # is much more branchy. Since this is called O(files in working copy)
+        # times regularly, we microoptimize here.
+        # This is safe because f is always a relative path.
+        return self._rootdir + f
+
     def flagfunc(self, buildfallback):
         if self._checklink and self._checkexec:
             def f(x):
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 4 of 5 RFC] windows: add relpath which knows about \\?\ prefix

Kostia Balytskyi-2
In reply to this post by Kostia Balytskyi-2
# HG changeset patch
# User Kostia Balytskyi <[hidden email]>
# Date 1502522597 25200
#      Sat Aug 12 00:23:17 2017 -0700
# Node ID 85d8ab670c128337f27013168e3f57b037d7996e
# Parent  bac7b1ae4a0c998c57cea5eac51af66853779918
windows: add relpath which knows about \\?\ prefix

We want to correctly handle things like relpath('c:\\a\\b', '\\\\\?\\c:\\a')
on Windows, since it's hard to immediately make sure that \\?\ is used
everywhere.

diff --git a/hgext/largefiles/lfcommands.py b/hgext/largefiles/lfcommands.py
--- a/hgext/largefiles/lfcommands.py
+++ b/hgext/largefiles/lfcommands.py
@@ -11,7 +11,6 @@ from __future__ import absolute_import
 
 import errno
 import hashlib
-import os
 import shutil
 
 from mercurial.i18n import _
@@ -461,11 +460,11 @@ def updatelfiles(ui, repo, filelist=None
         wctx = repo[None]
         for lfile in lfiles:
             rellfile = lfile
-            rellfileorig = os.path.relpath(
+            rellfileorig = util.relpath(
                 scmutil.origpath(ui, repo, wvfs.join(rellfile)),
                 start=repo.root)
             relstandin = lfutil.standin(lfile)
-            relstandinorig = os.path.relpath(
+            relstandinorig = util.relpath(
                 scmutil.origpath(ui, repo, wvfs.join(relstandin)),
                 start=repo.root)
             if wvfs.exists(relstandin):
diff --git a/mercurial/hg.py b/mercurial/hg.py
--- a/mercurial/hg.py
+++ b/mercurial/hg.py
@@ -241,7 +241,7 @@ def share(ui, source, dest=None, update=
 
     if relative:
         try:
-            sharedpath = os.path.relpath(sharedpath, destvfs.base)
+            sharedpath = util.relpath(sharedpath, destvfs.base)
             requirements += 'relshared\n'
         except IOError as e:
             raise error.Abort(_('cannot calculate relative path'),
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -2190,7 +2190,7 @@ class localrepository(object):
             fp.write(text)
         finally:
             fp.close()
-        return self.pathto(fp.name[len(self.root) + 1:])
+        return self.pathto(util.relpath(fp.name, self.root))
 
 # used to avoid circular references so destructors work
 def aftertrans(files):
diff --git a/mercurial/pathutil.py b/mercurial/pathutil.py
--- a/mercurial/pathutil.py
+++ b/mercurial/pathutil.py
@@ -179,7 +179,7 @@ def canonpath(root, cwd, myname, auditor
             if cwd != root:
                 canonpath(root, root, myname, auditor)
                 hint = (_("consider using '--cwd %s'")
-                        % os.path.relpath(root, cwd))
+                        % util.relpath(root, cwd))
         except error.Abort:
             pass
 
diff --git a/mercurial/posix.py b/mercurial/posix.py
--- a/mercurial/posix.py
+++ b/mercurial/posix.py
@@ -672,3 +672,6 @@ def absjoin(absprefix, relpath):
     the passed absolute path already contains a terminating directory
     separator."""
     return absprefix + relpath
+
+relpath = os.path.relpath
+
diff --git a/mercurial/scmutil.py b/mercurial/scmutil.py
--- a/mercurial/scmutil.py
+++ b/mercurial/scmutil.py
@@ -556,7 +556,7 @@ def origpath(ui, repo, filepath):
     if origbackuppath is None:
         return filepath + ".orig"
 
-    filepathfromroot = os.path.relpath(filepath, start=repo.root)
+    filepathfromroot = util.relpath(filepath, start=repo.root)
     fullorigpath = repo.wjoin(origbackuppath, filepathfromroot)
 
     origbackupdir = repo.vfs.dirname(fullorigpath)
diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -150,6 +150,7 @@ umask = platform.umask
 unlink = platform.unlink
 username = platform.username
 absjoin = platform.absjoin
+relpath = platform.relpath
 
 try:
     recvfds = osutil.recvfds
diff --git a/mercurial/windows.py b/mercurial/windows.py
--- a/mercurial/windows.py
+++ b/mercurial/windows.py
@@ -261,6 +261,14 @@ def absjoin(absprefix, relpath):
     else:
         return converttontpath(absprefix + relpath)
 
+def relpath(path, start=os.path.curdir):
+    """Relpath which knows how to work with //?/ prefixes"""
+    if not hasntprefix(path) and os.path.isabs(path):
+        path = converttontpath(path)
+    if not hasntprefix(start) and os.path.isabs(start):
+        start = converttontpath(start)
+    return os.path.relpath(path, start)
+
 # see posix.py for definitions
 normcasespec = encoding.normcasespecs.upper
 normcasefallback = encoding.upperfallback
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 5 of 5 RFC] run_tests: know how to match Windows //?/C:\Something paths

Kostia Balytskyi-2
In reply to this post by Kostia Balytskyi-2
# HG changeset patch
# User Kostia Balytskyi <[hidden email]>
# Date 1502478965 -10800
#      Fri Aug 11 22:16:05 2017 +0300
# Node ID 83de181a018eb0e7ea138121a854a04569140cbe
# Parent  85d8ab670c128337f27013168e3f57b037d7996e
run_tests: know how to match Windows //?/C:\Something paths

diff --git a/tests/run-tests.py b/tests/run-tests.py
--- a/tests/run-tests.py
+++ b/tests/run-tests.py
@@ -903,7 +903,19 @@ class Test(unittest.TestCase):
 
     def _escapepath(self, p):
         if os.name == 'nt':
+            # Explanation of the clowntown below:
+            # We want to match NT prefix \\?\ (see MSDN on Windows File Names).
+            # In order to match a signle \ in re, we need to escape it with
+            # another \, thus re has to receive \\ for each \ we want to match.
+            # Also, to match ?, re has to receive \?. But since we're
+            # concatenating strings, we can't use r'', so we have to also
+            # python-string-escape everything. Therefore:
+            # Want to match  |  re must receive  |  python str must contain
+            #       /        |         //        |         ////
+            #       ?        |         /?        |         //?
+            #     //?/       |      /////?//     |    //////////?////
             return (
+                b'(\\\\\\\\\\?\\\\){0,1}' +
                 (b''.join(c.isalpha() and b'[%s%s]' % (c.lower(), c.upper()) or
                     c in b'/\\' and br'[/\\]' or c.isdigit() and c or b'\\' + c
                     for c in p))
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 1 of 5 RFC] util: add platform-agnostic absjoin function and some long-paths win utils

Gregory Szorc
In reply to this post by Kostia Balytskyi-2


> On Aug 12, 2017, at 01:22, Kostia Balytskyi <[hidden email]> wrote:
>
> # HG changeset patch
> # User Kostia Balytskyi <[hidden email]>
> # Date 1502481241 25200
> #      Fri Aug 11 12:54:01 2017 -0700
> # Node ID 586870aa06799e2ef49a2be4c8feab1464ad54b1
> # Parent  609606d217659e0a6c1cf6f907b6512be5340e57
> util: add platform-agnostic absjoin function and some long-paths win utils
>
> The goal of this seies is to add support for long paths on Windows.
> Context here is that Win32 API CreateFileA (which opens a file handle and
> which is what CreateFile macro is resolved to by default) does not recognize
> paths longer than MAX_PATH constant (260 chars), unless the path starts
> with \\?\, which causes API to not do any checks or preprocessing of the
> file path. Therefore, it is posible to use long paths with this prefix,
> but it comes at a cost: forward slashes are not automatically replaced by
> baclslashes, . and .. are not automatically resolved to appropriate dirs.
>
> Also, it is hard to change paths to use //?/ everywhere, so I propose that
> the approach is to handle both types of paths simultaneously.
>
> Also, even if this series is not accepted as it is, I would like to start
> a discussion about solving this problem in general.
>
> This particular patch adds some first steps in reaching this goal:
> - it adds helper functions hasntprefix and converttontpath
> - it adds absjoin function which is supposed to replace _join method
> of dirstate

Cool series! This feature is long overdue.

At the point proper path support on Windows is fully implemented in Mercurial, how much of Python 3's pathlib have we reinvented? In other words, should we start coding to the pathlib API now so all kinds of path operations "just work?"

>
> diff --git a/mercurial/posix.py b/mercurial/posix.py
> --- a/mercurial/posix.py
> +++ b/mercurial/posix.py
> @@ -666,3 +666,9 @@ def bindunixsocket(sock, path):
>     if bakwdfd:
>         os.fchdir(bakwdfd)
>         os.close(bakwdfd)
> +
> +def absjoin(absprefix, relpath):
> +    """Join a relative path to an absolute path. This function assumes that
> +    the passed absolute path already contains a terminating directory
> +    separator."""
> +    return absprefix + relpath
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -149,6 +149,7 @@ testpid = platform.testpid
> umask = platform.umask
> unlink = platform.unlink
> username = platform.username
> +absjoin = platform.absjoin
>
> try:
>     recvfds = osutil.recvfds
> diff --git a/mercurial/windows.py b/mercurial/windows.py
> --- a/mercurial/windows.py
> +++ b/mercurial/windows.py
> @@ -236,6 +236,28 @@ def normpath(path):
> def normcase(path):
>     return encoding.upper(path) # NTFS compares via upper()
>
> +def converttontpath(path):
> +    """Prepend \\?\ prefix to a path and perform slash replacements
> +    The '\\?\' prefix tells Win32 API to not perform userland path checks,
> +    therefore allowing long paths. This also means that / are not replaced
> +    with \ by Win32, so we need to do it manually. See MSDN entry on
> +    CreateFile function for more details"""
> +    return '\\\\?\\' + os.path.normpath(path)
> +
> +def hasntprefix(path):
> +    """Check if path starts with a \\?\ prefix"""
> +    return path.startswith('\\\\?\\')
> +
> +def absjoin(absprefix, relpath):
> +    """See docblock for posix.absjoin for explanation of what this does."""
> +    if hasntprefix(absprefix):
> +        # we assume that if absprefix starts with \\?\, this means that it
> +        # was already preprocessed by converttontpath and therefore does
> +        # not need to passed to `localpath`
> +        return absprefix + localpath(relpath)
> +    else:
> +        return converttontpath(absprefix + relpath)
> +
> # see posix.py for definitions
> normcasespec = encoding.normcasespecs.upper
> normcasefallback = encoding.upperfallback
> _______________________________________________
> Mercurial-devel mailing list
> [hidden email]
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 3 of 5 RFC] dirstate: make dirstate._join a bit faster on posix

Yuya Nishihara
In reply to this post by Kostia Balytskyi-2
On Sat, 12 Aug 2017 01:22:58 -0700, Kostia Balytskyi wrote:

> # HG changeset patch
> # User Kostia Balytskyi <[hidden email]>
> # Date 1502522493 25200
> #      Sat Aug 12 00:21:33 2017 -0700
> # Node ID bac7b1ae4a0c998c57cea5eac51af66853779918
> # Parent  a1c8f310a66ada40697c824f4621bb54c757e9fa
> dirstate: make dirstate._join a bit faster on posix
>
> This is a micro-optimization, suggested by @quark for one of the previous
> patches. It is safe to land the series without this micro-optimization.
>
> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> --- a/mercurial/dirstate.py
> +++ b/mercurial/dirstate.py
> @@ -99,6 +99,9 @@ class dirstate(object):
>          # for consistent view between _pl() and _read() invocations
>          self._pendingmode = None
>  
> +        if pycompat.osname != 'nt':
> +            self._join = self._join_posix

Assigning a bound method to self creates a reference cycle. nt/posix can be
selected statically.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Loading...