[PATCH 1 of 2 stable] compat: initialize LC_CTYPE locale on all Python versions and platforms

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 1 of 2 stable] compat: initialize LC_CTYPE locale on all Python versions and platforms

Manuel Jacob
# HG changeset patch
# User Manuel Jacob <[hidden email]>
# Date 1593137270 -7200
#      Fri Jun 26 04:07:50 2020 +0200
# Branch stable
# Node ID 314f194471625fe63adb3e76f9315e82e7f2994b
# Parent  2fd8a8c1127378bce6e6a4205fe7e3c62134cb99
# EXP-Topic init_locale
compat: initialize LC_CTYPE locale on all Python versions and platforms

Previously, the LC_CTYPE locale was not initialized according to user settings
on all Python versions (e.g. never on Python 2) and platforms (e.g. not on
some Python < 3.8 on Windows).

This broke e.g. non-ASCII filenames passed to the Subversion bindings on Python
2, resulting in error messages like "file:///tmp/a%C3%A4 does not look like a
Subversion repository to libsvn version 1.14.0".

The following command could be used to test this functionality. Adding it to the
test suite would be pointless, as the locale is always set to "C" during test
runs.

@command(b'check_initial_codeset', norepo=True)
def check_initial_codeset(ui):
    codeset1 = locale.nl_langinfo(locale.CODESET)
    locale.setlocale(locale.LC_ALL, '')
    codeset2 = locale.nl_langinfo(locale.CODESET)
    assert codeset1 == codeset2

diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
--- a/mercurial/pycompat.py
+++ b/mercurial/pycompat.py
@@ -13,6 +13,7 @@
 import getopt
 import inspect
 import json
+import locale
 import os
 import shlex
 import sys
@@ -93,6 +94,26 @@
     return _rapply(f, xs)
 
 
+# Passing the '' locale means that the locale should be set according to the
+# user settings (environment variables).
+# Python sometimes avoids setting the global locale settings. When interfacing
+# with C code (e.g. the curses module or the Subversion bindings), the global
+# locale settings must be initialized correctly. Python 2 does not initialize
+# the global locale settings on interpreter startup. Python 3 sometimes
+# initializes LC_CTYPE, but not consistently at least on Windows. Therefore we
+# explicitly initialize it to get consistent behavior if it's not already
+# initialized. Since CPython commit 177d921c8c03d30daa32994362023f777624b10d,
+# LC_CTYPE is always initialized. If we require Python 3.8+, we should re-check
+# if we can remove this code.
+if locale.setlocale(locale.LC_CTYPE, None) == 'C':
+    try:
+        locale.setlocale(locale.LC_CTYPE, '')
+    except locale.Error:
+        # The likely case is that the locale from the environment variables is
+        # unknown.
+        pass
+
+
 if ispy3:
     import builtins
     import codecs

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

[PATCH 2 of 2 stable] curses: do not initialize LC_ALL to user settings (issue6358)

Manuel Jacob
# HG changeset patch
# User Manuel Jacob <[hidden email]>
# Date 1593157054 -7200
#      Fri Jun 26 09:37:34 2020 +0200
# Branch stable
# Node ID 0a2e20956b859460c5ab22877f46c50030ab0120
# Parent  314f194471625fe63adb3e76f9315e82e7f2994b
# EXP-Topic init_locale
curses: do not initialize LC_ALL to user settings (issue6358)

701341f57ceb moved the setlocale() call to right before curses was used. This
didn’t fully solve the problem it was supposed to solve (locale-dependent
functions, like date formatting/parsing), but only postponed it.

Initializing LC_CTYPE seems to be sufficient for curses to work correctly.
Luckily this is already done at interpreter startup on modern Python versions
and, since recently, by Mercurial in the pycompat module in all other cases.

diff --git a/hgext/histedit.py b/hgext/histedit.py
--- a/hgext/histedit.py
+++ b/hgext/histedit.py
@@ -1710,10 +1710,6 @@
         ctxs = []
         for i, r in enumerate(revs):
             ctxs.append(histeditrule(ui, repo[r], i))
-        # Curses requires setting the locale or it will default to the C
-        # locale. This sets the locale to the user's default system
-        # locale.
-        locale.setlocale(locale.LC_ALL, '')
         rc = curses.wrapper(functools.partial(_chisteditmain, repo, ctxs))
         curses.echo()
         curses.endwin()
diff --git a/mercurial/crecord.py b/mercurial/crecord.py
--- a/mercurial/crecord.py
+++ b/mercurial/crecord.py
@@ -574,9 +574,6 @@
     """
     ui.write(_(b'starting interactive selection\n'))
     chunkselector = curseschunkselector(headerlist, ui, operation)
-    # This is required for ncurses to display non-ASCII characters in
-    # default user locale encoding correctly.  --immerrr
-    locale.setlocale(locale.LC_ALL, '')
     origsigtstp = sentinel = object()
     if util.safehasattr(signal, b'SIGTSTP'):
         origsigtstp = signal.getsignal(signal.SIGTSTP)
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

[PATCH stable] curses: do not initialize LC_ALL to user settings (issue6358)

Manuel Jacob
In reply to this post by Manuel Jacob
# HG changeset patch
# User Manuel Jacob <[hidden email]>
# Date 1593157054 -7200
#      Fri Jun 26 09:37:34 2020 +0200
# Branch stable
# Node ID 36dcc87c0c15d8757ef0322c820861b1d10350c1
# Parent  9bb2fc4ac22b0b986d56fd13be4e6b524af7b648
# EXP-Topic init_locale
curses: do not initialize LC_ALL to user settings (issue6358)

701341f57ceb moved the setlocale() call to right before curses was used. This
didn’t fully solve the problem it was supposed to solve (locale-dependent
functions, like date formatting/parsing), but only postponed it.

Initializing LC_CTYPE seems to be sufficient for curses to work correctly.
Luckily this is already done at interpreter startup on modern Python versions
and, since recently, by Mercurial in the pycompat module in all other cases.

diff --git a/hgext/histedit.py b/hgext/histedit.py
--- a/hgext/histedit.py
+++ b/hgext/histedit.py
@@ -201,7 +201,6 @@
     termios = None
 
 import functools
-import locale
 import os
 import struct
 
@@ -1710,10 +1709,6 @@
         ctxs = []
         for i, r in enumerate(revs):
             ctxs.append(histeditrule(ui, repo[r], i))
-        # Curses requires setting the locale or it will default to the C
-        # locale. This sets the locale to the user's default system
-        # locale.
-        locale.setlocale(locale.LC_ALL, '')
         rc = curses.wrapper(functools.partial(_chisteditmain, repo, ctxs))
         curses.echo()
         curses.endwin()
diff --git a/mercurial/crecord.py b/mercurial/crecord.py
--- a/mercurial/crecord.py
+++ b/mercurial/crecord.py
@@ -10,7 +10,6 @@
 
 from __future__ import absolute_import
 
-import locale
 import os
 import re
 import signal
@@ -574,9 +573,6 @@
     """
     ui.write(_(b'starting interactive selection\n'))
     chunkselector = curseschunkselector(headerlist, ui, operation)
-    # This is required for ncurses to display non-ASCII characters in
-    # default user locale encoding correctly.  --immerrr
-    locale.setlocale(locale.LC_ALL, '')
     origsigtstp = sentinel = object()
     if util.safehasattr(signal, b'SIGTSTP'):
         origsigtstp = signal.getsignal(signal.SIGTSTP)
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

[PATCH stable v2] curses: do not initialize LC_ALL to user settings (issue6358)

Manuel Jacob
In reply to this post by Manuel Jacob
# HG changeset patch
# User Manuel Jacob <[hidden email]>
# Date 1593157054 -7200
#      Fri Jun 26 09:37:34 2020 +0200
# Branch stable
# Node ID 36dcc87c0c15d8757ef0322c820861b1d10350c1
# Parent  9bb2fc4ac22b0b986d56fd13be4e6b524af7b648
# EXP-Topic init_locale
curses: do not initialize LC_ALL to user settings (issue6358)

701341f57ceb moved the setlocale() call to right before curses was used. This
didn’t fully solve the problem it was supposed to solve (locale-dependent
functions, like date formatting/parsing), but only postponed it.

Initializing LC_CTYPE seems to be sufficient for curses to work correctly.
Luckily this is already done at interpreter startup on modern Python versions
and, since recently, by Mercurial in the pycompat module in all other cases.

diff --git a/hgext/histedit.py b/hgext/histedit.py
--- a/hgext/histedit.py
+++ b/hgext/histedit.py
@@ -201,7 +201,6 @@
     termios = None
 
 import functools
-import locale
 import os
 import struct
 
@@ -1710,10 +1709,6 @@
         ctxs = []
         for i, r in enumerate(revs):
             ctxs.append(histeditrule(ui, repo[r], i))
-        # Curses requires setting the locale or it will default to the C
-        # locale. This sets the locale to the user's default system
-        # locale.
-        locale.setlocale(locale.LC_ALL, '')
         rc = curses.wrapper(functools.partial(_chisteditmain, repo, ctxs))
         curses.echo()
         curses.endwin()
diff --git a/mercurial/crecord.py b/mercurial/crecord.py
--- a/mercurial/crecord.py
+++ b/mercurial/crecord.py
@@ -10,7 +10,6 @@
 
 from __future__ import absolute_import
 
-import locale
 import os
 import re
 import signal
@@ -574,9 +573,6 @@
     """
     ui.write(_(b'starting interactive selection\n'))
     chunkselector = curseschunkselector(headerlist, ui, operation)
-    # This is required for ncurses to display non-ASCII characters in
-    # default user locale encoding correctly.  --immerrr
-    locale.setlocale(locale.LC_ALL, '')
     origsigtstp = sentinel = object()
     if util.safehasattr(signal, b'SIGTSTP'):
         origsigtstp = signal.getsignal(signal.SIGTSTP)
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH stable v2] curses: do not initialize LC_ALL to user settings (issue6358)

Yuya Nishihara
On Fri, 26 Jun 2020 10:35:24 +0200, Manuel Jacob wrote:
> # HG changeset patch
> # User Manuel Jacob <[hidden email]>
> # Date 1593157054 -7200
> #      Fri Jun 26 09:37:34 2020 +0200
> # Branch stable
> # Node ID 36dcc87c0c15d8757ef0322c820861b1d10350c1
> # Parent  9bb2fc4ac22b0b986d56fd13be4e6b524af7b648
> # EXP-Topic init_locale
> curses: do not initialize LC_ALL to user settings (issue6358)

Queued these to default, thanks. The issue to be fixed is relatively small,
but the setlocale() change might have unwanted side effect.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH stable v2] curses: do not initialize LC_ALL to user settings (issue6358)

Manuel Jacob
On 2020-06-26 13:47, Yuya Nishihara wrote:

> On Fri, 26 Jun 2020 10:35:24 +0200, Manuel Jacob wrote:
>> # HG changeset patch
>> # User Manuel Jacob <[hidden email]>
>> # Date 1593157054 -7200
>> #      Fri Jun 26 09:37:34 2020 +0200
>> # Branch stable
>> # Node ID 36dcc87c0c15d8757ef0322c820861b1d10350c1
>> # Parent  9bb2fc4ac22b0b986d56fd13be4e6b524af7b648
>> # EXP-Topic init_locale
>> curses: do not initialize LC_ALL to user settings (issue6358)
>
> Queued these to default, thanks. The issue to be fixed is relatively
> small,
> but the setlocale() change might have unwanted side effect.

As mentioned in the first patch, non-ASCII filenames in
hgext.convert.subversion are fixed by it. If I make further non-ASCII
fixes to hgext.convert.subversion, should do they go to stable (for
testing, I can ensure that LC_CTYPE is initialized otherwise on Python
2) or default? For me as a user, it doesn’t make a difference, as I
don’t use Subversion (especially not with non-ASCII filenames).
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH stable v2] curses: do not initialize LC_ALL to user settings (issue6358)

Yuya Nishihara
On Fri, 26 Jun 2020 14:42:58 +0200, Manuel Jacob wrote:

> On 2020-06-26 13:47, Yuya Nishihara wrote:
> > On Fri, 26 Jun 2020 10:35:24 +0200, Manuel Jacob wrote:
> >> # HG changeset patch
> >> # User Manuel Jacob <[hidden email]>
> >> # Date 1593157054 -7200
> >> #      Fri Jun 26 09:37:34 2020 +0200
> >> # Branch stable
> >> # Node ID 36dcc87c0c15d8757ef0322c820861b1d10350c1
> >> # Parent  9bb2fc4ac22b0b986d56fd13be4e6b524af7b648
> >> # EXP-Topic init_locale
> >> curses: do not initialize LC_ALL to user settings (issue6358)
> >
> > Queued these to default, thanks. The issue to be fixed is relatively
> > small,
> > but the setlocale() change might have unwanted side effect.
>
> As mentioned in the first patch, non-ASCII filenames in
> hgext.convert.subversion are fixed by it. If I make further non-ASCII
> fixes to hgext.convert.subversion, should do they go to stable (for
> testing, I can ensure that LC_CTYPE is initialized otherwise on Python
> 2) or default? For me as a user, it doesn’t make a difference, as I
> don’t use Subversion (especially not with non-ASCII filenames).

If the scope of the change is narrowed down to the conversion extension,
I think it may go stable. As a user, I never use non-ASCII filenames, too.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH stable v2] curses: do not initialize LC_ALL to user settings (issue6358)

Yuya Nishihara
In reply to this post by Manuel Jacob
On Fri, 26 Jun 2020 10:35:24 +0200, Manuel Jacob wrote:

> # HG changeset patch
> # User Manuel Jacob <[hidden email]>
> # Date 1593157054 -7200
> #      Fri Jun 26 09:37:34 2020 +0200
> # Branch stable
> # Node ID 36dcc87c0c15d8757ef0322c820861b1d10350c1
> # Parent  9bb2fc4ac22b0b986d56fd13be4e6b524af7b648
> # EXP-Topic init_locale
> curses: do not initialize LC_ALL to user settings (issue6358)
>
> 701341f57ceb moved the setlocale() call to right before curses was used. This
> didn’t fully solve the problem it was supposed to solve (locale-dependent
> functions, like date formatting/parsing), but only postponed it.
>
> Initializing LC_CTYPE seems to be sufficient for curses to work correctly.
> Luckily this is already done at interpreter startup on modern Python versions
> and, since recently, by Mercurial in the pycompat module in all other cases.

Sorry to be a bit late, but I noticed a problem on Python 2.

LC_CTYPE changes the character classification rules, which unfortunately
changes the behavior of bytes.is*() functions on Python 2.

https://man7.org/linux/man-pages/man7/locale.7.html
https://docs.python.org/2.7/library/stdtypes.html?highlight=isalpha#str.isalnum

  % python -c 'import locale; locale.setlocale(locale.LC_CTYPE, "de_DE"); print(b"\xe4".isalpha())'
  True
  % python3 -c 'import locale; locale.setlocale(locale.LC_CTYPE, "de_DE"); print(b"\xe4".isalpha())'
  False

So, on Python 2, this patch would make template syntax locale-dependent for
example.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH stable v2] curses: do not initialize LC_ALL to user settings (issue6358)

Manuel Jacob
On 2020-06-27 05:27, Yuya Nishihara wrote:

> On Fri, 26 Jun 2020 10:35:24 +0200, Manuel Jacob wrote:
>> # HG changeset patch
>> # User Manuel Jacob <[hidden email]>
>> # Date 1593157054 -7200
>> #      Fri Jun 26 09:37:34 2020 +0200
>> # Branch stable
>> # Node ID 36dcc87c0c15d8757ef0322c820861b1d10350c1
>> # Parent  9bb2fc4ac22b0b986d56fd13be4e6b524af7b648
>> # EXP-Topic init_locale
>> curses: do not initialize LC_ALL to user settings (issue6358)
>>
>> 701341f57ceb moved the setlocale() call to right before curses was
>> used. This
>> didn’t fully solve the problem it was supposed to solve
>> (locale-dependent
>> functions, like date formatting/parsing), but only postponed it.
>>
>> Initializing LC_CTYPE seems to be sufficient for curses to work
>> correctly.
>> Luckily this is already done at interpreter startup on modern Python
>> versions
>> and, since recently, by Mercurial in the pycompat module in all other
>> cases.
>
> Sorry to be a bit late, but I noticed a problem on Python 2.
>
> LC_CTYPE changes the character classification rules, which
> unfortunately
> changes the behavior of bytes.is*() functions on Python 2.
>
> https://man7.org/linux/man-pages/man7/locale.7.html
> https://docs.python.org/2.7/library/stdtypes.html?highlight=isalpha#str.isalnum
>
>   % python -c 'import locale; locale.setlocale(locale.LC_CTYPE,
> "de_DE"); print(b"\xe4".isalpha())'
>   True
>   % python3 -c 'import locale; locale.setlocale(locale.LC_CTYPE,
> "de_DE"); print(b"\xe4".isalpha())'
>   False
>
> So, on Python 2, this patch would make template syntax locale-dependent
> for
> example.

I spent quite some time trying to find out why Python 2 doesn’t
initialize LC_CTYPE. Now I know!

For curses, we can probably work around the problem by resetting
LC_CTYPE after calling initscr() or after we close the curses UI.

For Subversion it seems like LC_CTYPE must be set all the time instead
of only during initialization, but I have to double-check.

I can prepare more patches during this weekend. If that’s too long, feel
free to backout the existing changes.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel