[PATCH 1 of 2 stable] hgweb: deduplicate code

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 1 of 2 stable] hgweb: deduplicate code

Manuel Jacob
# HG changeset patch
# User Manuel Jacob <[hidden email]>
# Date 1593047413 -7200
#      Thu Jun 25 03:10:13 2020 +0200
# Branch stable
# Node ID 8f730a30fb20a104bbf5665e1f7d0d4e4aaedf6f
# Parent  3d41172f2ac9ae7b644f3e5f239c28e895f1c4fa
# EXP-Topic cgi_env_encoding
hgweb: deduplicate code

A following patch will change the way keys and values are encoded. To reduce the
diff, I’ve split off the uninteresting part.

diff --git a/mercurial/hgweb/request.py b/mercurial/hgweb/request.py
--- a/mercurial/hgweb/request.py
+++ b/mercurial/hgweb/request.py
@@ -162,11 +162,11 @@
     # strings on Python 3 must be between \00000-\000FF. We deal with bytes
     # in Mercurial, so mass convert string keys and values to bytes.
     if pycompat.ispy3:
-        env = {k.encode('latin-1'): v for k, v in pycompat.iteritems(env)}
-        env = {
-            k: v.encode('latin-1') if isinstance(v, str) else v
-            for k, v in pycompat.iteritems(env)
-        }
+        def tobytes(s):
+            if not isinstance(s, str):
+                return s
+            return s.encode('latin-1')
+        env = {tobytes(k): tobytes(v) for k, v in pycompat.iteritems(env)}
 
     # Some hosting solutions are emulating hgwebdir, and dispatching directly
     # to an hgweb instance using this environment variable.  This was always
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

[PATCH 2 of 2 stable] hgweb: encode WSGI environment like OS environment

Manuel Jacob
# HG changeset patch
# User Manuel Jacob <[hidden email]>
# Date 1593049567 -7200
#      Thu Jun 25 03:46:07 2020 +0200
# Branch stable
# Node ID c115cca2d19d55c2538def5c95a68ceff597f45d
# Parent  8f730a30fb20a104bbf5665e1f7d0d4e4aaedf6f
# EXP-Topic cgi_env_encoding
hgweb: encode WSGI environment like OS environment

Previously, the WSGI environment keys and values were encoded using latin-1.
This resulted in a crash if a WSGI environment key or value could not be encoded
using latin-1.

On Unix, the OS environment is byte-based. Therefore we should do the reverse of
what Python does for os.environ.

On Windows, there’s no native byte-based OS environment. Therefore we should do
the same as what mercurial.encoding does with the OS environment.

diff --git a/mercurial/hgweb/request.py b/mercurial/hgweb/request.py
--- a/mercurial/hgweb/request.py
+++ b/mercurial/hgweb/request.py
@@ -8,10 +8,13 @@
 
 from __future__ import absolute_import
 
+import sys
+
 # import wsgiref.validate
 
 from ..thirdparty import attr
 from .. import (
+    encoding,
     error,
     pycompat,
     util,
@@ -162,10 +165,18 @@
     # strings on Python 3 must be between \00000-\000FF. We deal with bytes
     # in Mercurial, so mass convert string keys and values to bytes.
     if pycompat.ispy3:
+        fsencoding = sys.getfilesystemencoding()
+
         def tobytes(s):
             if not isinstance(s, str):
                 return s
-            return s.encode('latin-1')
+            if pycompat.iswindows:
+                # This is what mercurial.encoding does for os.environ on Windows.
+                return encoding.strtolocal(s)
+            else:
+                # This is what is documented to be used for os.environ on Unix.
+                return s.encode(fsencoding, 'surrogateescape')
+
         env = {tobytes(k): tobytes(v) for k, v in pycompat.iteritems(env)}
 
     # Some hosting solutions are emulating hgwebdir, and dispatching directly
diff --git a/tests/test-wsgirequest.py b/tests/test-wsgirequest.py
--- a/tests/test-wsgirequest.py
+++ b/tests/test-wsgirequest.py
@@ -3,7 +3,7 @@
 import unittest
 
 from mercurial.hgweb import request as requestmod
-from mercurial import error
+from mercurial import error, pycompat
 
 DEFAULT_ENV = {
     'REQUEST_METHOD': 'GET',
@@ -432,6 +432,18 @@
         self.assertEqual(r.dispatchpath, b'path1/path2')
         self.assertEqual(r.reponame, b'repo')
 
+    def testenvencoding(self):
+        if pycompat.iswindows:
+            # On Windows, we can't generally know which non-ASCII characters
+            # are supported.
+            r = parse(DEFAULT_ENV, extra={'foo': 'bar'})
+            self.assertEqual(r.rawenv[b'foo'], b'bar')
+        else:
+            # Unix is byte-based. Therefore we test all possible bytes.
+            b = b''.join(pycompat.bytechr(i) for i in range(256))
+            r = parse(DEFAULT_ENV, extra={'foo': pycompat.fsdecode(b)})
+            self.assertEqual(r.rawenv[b'foo'], b)
+
 
 if __name__ == '__main__':
     import silenttestrunner
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2 of 2 stable] hgweb: encode WSGI environment like OS environment

Yuya Nishihara
On Thu, 25 Jun 2020 05:14:00 +0200, Manuel Jacob wrote:

> # HG changeset patch
> # User Manuel Jacob <[hidden email]>
> # Date 1593049567 -7200
> #      Thu Jun 25 03:46:07 2020 +0200
> # Branch stable
> # Node ID c115cca2d19d55c2538def5c95a68ceff597f45d
> # Parent  8f730a30fb20a104bbf5665e1f7d0d4e4aaedf6f
> # EXP-Topic cgi_env_encoding
> hgweb: encode WSGI environment like OS environment
>
> Previously, the WSGI environment keys and values were encoded using latin-1.
> This resulted in a crash if a WSGI environment key or value could not be encoded
> using latin-1.
>
> On Unix, the OS environment is byte-based. Therefore we should do the reverse of
> what Python does for os.environ.
>
> On Windows, there’s no native byte-based OS environment. Therefore we should do
> the same as what mercurial.encoding does with the OS environment.
>
> diff --git a/mercurial/hgweb/request.py b/mercurial/hgweb/request.py
> --- a/mercurial/hgweb/request.py
> +++ b/mercurial/hgweb/request.py
> @@ -8,10 +8,13 @@
>  
>  from __future__ import absolute_import
>  
> +import sys
> +
>  # import wsgiref.validate
>  
>  from ..thirdparty import attr
>  from .. import (
> +    encoding,
>      error,
>      pycompat,
>      util,
> @@ -162,10 +165,18 @@
>      # strings on Python 3 must be between \00000-\000FF. We deal with bytes
>      # in Mercurial, so mass convert string keys and values to bytes.
>      if pycompat.ispy3:
> +        fsencoding = sys.getfilesystemencoding()
> +
>          def tobytes(s):
>              if not isinstance(s, str):
>                  return s
> -            return s.encode('latin-1')
> +            if pycompat.iswindows:
> +                # This is what mercurial.encoding does for os.environ on Windows.
> +                return encoding.strtolocal(s)
> +            else:
> +                # This is what is documented to be used for os.environ on Unix.
> +                return s.encode(fsencoding, 'surrogateescape')

This can be pycompat.fsencode(), which I think is more widely used in our
codebase.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2 of 2 stable] hgweb: encode WSGI environment like OS environment

Manuel Jacob
On 2020-06-25 15:44, Yuya Nishihara wrote:

> On Thu, 25 Jun 2020 05:14:00 +0200, Manuel Jacob wrote:
>> # HG changeset patch
>> # User Manuel Jacob <[hidden email]>
>> # Date 1593049567 -7200
>> #      Thu Jun 25 03:46:07 2020 +0200
>> # Branch stable
>> # Node ID c115cca2d19d55c2538def5c95a68ceff597f45d
>> # Parent  8f730a30fb20a104bbf5665e1f7d0d4e4aaedf6f
>> # EXP-Topic cgi_env_encoding
>> hgweb: encode WSGI environment like OS environment
>>
>> Previously, the WSGI environment keys and values were encoded using
>> latin-1.
>> This resulted in a crash if a WSGI environment key or value could not
>> be encoded
>> using latin-1.
>>
>> On Unix, the OS environment is byte-based. Therefore we should do the
>> reverse of
>> what Python does for os.environ.
>>
>> On Windows, there’s no native byte-based OS environment. Therefore we
>> should do
>> the same as what mercurial.encoding does with the OS environment.
>>
>> diff --git a/mercurial/hgweb/request.py b/mercurial/hgweb/request.py
>> --- a/mercurial/hgweb/request.py
>> +++ b/mercurial/hgweb/request.py
>> @@ -8,10 +8,13 @@
>>
>>  from __future__ import absolute_import
>>
>> +import sys
>> +
>>  # import wsgiref.validate
>>
>>  from ..thirdparty import attr
>>  from .. import (
>> +    encoding,
>>      error,
>>      pycompat,
>>      util,
>> @@ -162,10 +165,18 @@
>>      # strings on Python 3 must be between \00000-\000FF. We deal with
>> bytes
>>      # in Mercurial, so mass convert string keys and values to bytes.
>>      if pycompat.ispy3:
>> +        fsencoding = sys.getfilesystemencoding()
>> +
>>          def tobytes(s):
>>              if not isinstance(s, str):
>>                  return s
>> -            return s.encode('latin-1')
>> +            if pycompat.iswindows:
>> +                # This is what mercurial.encoding does for os.environ
>> on Windows.
>> +                return encoding.strtolocal(s)
>> +            else:
>> +                # This is what is documented to be used for
>> os.environ on Unix.
>> +                return s.encode(fsencoding, 'surrogateescape')
>
> This can be pycompat.fsencode(), which I think is more widely used in
> our
> codebase.

I wanted to follow the documentation more closely. I’ll send another
patch. Choose whichever you like more. :)
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

[PATCH v2] hgweb: encode WSGI environment like OS environment

Manuel Jacob
In reply to this post by Yuya Nishihara
# HG changeset patch
# User Manuel Jacob <[hidden email]>
# Date 1593049567 -7200
#      Thu Jun 25 03:46:07 2020 +0200
# Branch stable
# Node ID 5adbe0742aac22c15d04d143a08b854c77e4ea7c
# Parent  9a3cc406efed4d1b11ccf91ede87fe8615672902
# EXP-Topic cgi_env_encoding
hgweb: encode WSGI environment like OS environment

Previously, the WSGI environment keys and values were encoded using latin-1.
This resulted in a crash if a WSGI environment key or value could not be encoded
using latin-1.

On Unix, the OS environment is byte-based. Therefore we should do the reverse of
what Python does for os.environ.

On Windows, there’s no native byte-based OS environment. Therefore we should do
the same as what mercurial.encoding does with the OS environment.

diff --git a/mercurial/hgweb/request.py b/mercurial/hgweb/request.py
--- a/mercurial/hgweb/request.py
+++ b/mercurial/hgweb/request.py
@@ -8,10 +8,13 @@
 
 from __future__ import absolute_import
 
+import os
+
 # import wsgiref.validate
 
 from ..thirdparty import attr
 from .. import (
+    encoding,
     error,
     pycompat,
     util,
@@ -162,10 +165,17 @@
     # strings on Python 3 must be between \00000-\000FF. We deal with bytes
     # in Mercurial, so mass convert string keys and values to bytes.
     if pycompat.ispy3:
+
         def tobytes(s):
             if not isinstance(s, str):
                 return s
-            return s.encode('latin-1')
+            if pycompat.iswindows:
+                # This is what mercurial.encoding does for os.environ on Windows.
+                return encoding.strtolocal(s)
+            else:
+                # This is what is documented to be used for os.environ on Unix.
+                return os.fsencode(s)
+
         env = {tobytes(k): tobytes(v) for k, v in pycompat.iteritems(env)}
 
     # Some hosting solutions are emulating hgwebdir, and dispatching directly
diff --git a/tests/test-wsgirequest.py b/tests/test-wsgirequest.py
--- a/tests/test-wsgirequest.py
+++ b/tests/test-wsgirequest.py
@@ -3,7 +3,7 @@
 import unittest
 
 from mercurial.hgweb import request as requestmod
-from mercurial import error
+from mercurial import error, pycompat
 
 DEFAULT_ENV = {
     'REQUEST_METHOD': 'GET',
@@ -432,6 +432,18 @@
         self.assertEqual(r.dispatchpath, b'path1/path2')
         self.assertEqual(r.reponame, b'repo')
 
+    def testenvencoding(self):
+        if pycompat.iswindows:
+            # On Windows, we can't generally know which non-ASCII characters
+            # are supported.
+            r = parse(DEFAULT_ENV, extra={'foo': 'bar'})
+            self.assertEqual(r.rawenv[b'foo'], b'bar')
+        else:
+            # Unix is byte-based. Therefore we test all possible bytes.
+            b = b''.join(pycompat.bytechr(i) for i in range(256))
+            r = parse(DEFAULT_ENV, extra={'foo': pycompat.fsdecode(b)})
+            self.assertEqual(r.rawenv[b'foo'], b)
+
 
 if __name__ == '__main__':
     import silenttestrunner
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2] hgweb: encode WSGI environment like OS environment

Yuya Nishihara
On Thu, 25 Jun 2020 17:12:57 +0200, Manuel Jacob wrote:
> # HG changeset patch
> # User Manuel Jacob <[hidden email]>
> # Date 1593049567 -7200
> #      Thu Jun 25 03:46:07 2020 +0200
> # Branch stable
> # Node ID 5adbe0742aac22c15d04d143a08b854c77e4ea7c
> # Parent  9a3cc406efed4d1b11ccf91ede87fe8615672902
> # EXP-Topic cgi_env_encoding
> hgweb: encode WSGI environment like OS environment

Queued this version for stable, thanks.

> +            if pycompat.iswindows:
> +                # This is what mercurial.encoding does for os.environ on Windows.
> +                return encoding.strtolocal(s)
> +            else:
> +                # This is what is documented to be used for os.environ on Unix.
> +                return os.fsencode(s)

Changed to pycompat.fsencode() for consistency.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel