Quantcast

Can't add any non-ascii files on mercurial for Windows?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Can't add any non-ascii files on mercurial for Windows?

Nathan Davis
Hi,

I posted the following question: http://stackoverflow.com/questions/12540247/unicode-filenames-on-windows-mercurial-2-5-or-future and was redirected to this mailing list.  The person who answered seemed to think that I should be able to add files with non-ascii filenames just fine on Windows, so long as I didn't need to be cross-platform.

However, I cannot even add a file with (for example) Chinese characters in a filename to a local repository. From tortoisHg, it shows up as all ? characters, and attempting to add it results in "[Error 123] The filename, directory name, or volume label syntax is incorrect: 'C:\\projects\\utf8test\\???.txt'". And from the command line if i run`hg add *.txt`, I get "???.txt: The filename, directory name, or volume label syntax is incorrect".

I am using TortoiseHg 2.5 / Mercurial 2.3.1 on Windows XP.  The test filename I used was "你好吗.txt" (that is unicode code points U+4F60 U+597D U+5417)

Does mercurial support non-ascii filenames on Windows at all, and if so, under what conditions?

Thanks in advance.

_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can't add any non-ascii files on mercurial for Windows?

Matt Mackall
On Sun, 2012-09-23 at 15:30 -0700, Nathan Davis wrote:

> Hi,
>
> I posted the following question: http://stackoverflow.com/questions/12540247/unicode-filenames-on-windows-mercurial-2-5-or-future and was redirected to this mailing list.  The person who answered seemed to think that I should be able to add files with non-ascii filenames just fine on Windows, so long as I didn't need to be cross-platform.
>
>
> However, I cannot even add a file with (for example) Chinese
> characters in a filename to a local repository. From tortoisHg, it
> shows up as all ? characters, and attempting to add it results in
> "[Error 123] The filename, directory name, or volume label syntax is
> incorrect: 'C:\\projects\\utf8test\\???.txt'". And from the command
> line if i run`hg add *.txt`, I get "???.txt: The filename, directory
> name, or volume label syntax is incorrect".

What ANSI codepage is your system running in? For now, if it doesn't
support Chinese characters (ie cp936 or cp950), you can't check in such
files because they're invisible to the ANSI C filesystem APIs.

See:

http://mercurial.selenic.com/wiki/EncodingStrategy
http://mercurial.selenic.com/wiki/WindowsUTF8Plan

--
Mathematics is the supreme nostalgia of our time.


_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can't add any non-ascii files on mercurial for Windows?

Nathan Davis
Thanks for your reply.

My default Codepage is 437 (HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\OEMCP), but I also (on the command line via chcp) experimented with a few other code pages (50227, 50229, 65001) which didn't change the behavior any.

So, it sounds like, at least for now, I am out of luck.

On the other hand  I saw a link that http://mercurial.markmail.org/search/?q=fro%3A%22FUJIWARA+Katsunori%22+list%3Acom.selenic.mercurial-devel+vfs is working on supporting utf8 filenames for Windows, and the links you provided seem to indicate that mercurial is planning on fully incorporating this in the future.

I know no one can give a date for when support will be released, but is there some indication of where in the process the Windows UTF8 plan is in implementation (e.g. just a proposal, significant coding done, testing, or some such)?


From: Matt Mackall <[hidden email]>
To: Nathan Davis <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Sent: Monday, September 24, 2012 11:25 AM
Subject: Re: Can't add any non-ascii files on mercurial for Windows?

On Sun, 2012-09-23 at 15:30 -0700, Nathan Davis wrote:

> Hi,
>
> I posted the following question: http://stackoverflow.com/questions/12540247/unicode-filenames-on-windows-mercurial-2-5-or-future and was redirected to this mailing list.  The person who answered seemed to think that I should be able to add files with non-ascii filenames just fine on Windows, so long as I didn't need to be cross-platform.
>
>
> However, I cannot even add a file with (for example) Chinese
> characters in a filename to a local repository. From tortoisHg, it
> shows up as all ? characters, and attempting to add it results in
> "[Error 123] The filename, directory name, or volume label syntax is
> incorrect: 'C:\\projects\\utf8test\\???.txt'". And from the command
> line if i run`hg add *.txt`, I get "???.txt: The filename, directory
> name, or volume label syntax is incorrect".

What ANSI codepage is your system running in? For now, if it doesn't
support Chinese characters (ie cp936 or cp950), you can't check in such
files because they're invisible to the ANSI C filesystem APIs.

See:

http://mercurial.selenic.com/wiki/EncodingStrategy
http://mercurial.selenic.com/wiki/WindowsUTF8Plan

--
Mathematics is the supreme nostalgia of our time.





_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can't add any non-ascii files on mercurial for Windows?

Tony Mechelynck
On 25/09/12 04:09, Nathan Davis wrote:
> Thanks for your reply.
>
> My default Codepage is 437
> (|HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\OEMCP)|,
> but I also (on the command line via chcp) experimented with a few other
> code pages (50227, 50229, 65001) which didn't change the behavior any.
>
> So, it sounds like, at least for now, I am out of luck.
[...]

Try 10646. IIRC that's the Windows codepage for Unicode (not sure if
it's UTF-8 or UTF-16le, and the name comes from ISO/IEC 10646 which is
also Unicode, but from the ISO point of view).


Best regards,
Tony.
--
A non-vegetarian anti-abortionist is a contradiction in terms.
                -- Phyllis Schlafly


_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can't add any non-ascii files on mercurial for Windows?

Matt Mackall
On Tue, 2012-09-25 at 16:11 +0200, Tony Mechelynck wrote:

> On 25/09/12 04:09, Nathan Davis wrote:
> > Thanks for your reply.
> >
> > My default Codepage is 437
> > (|HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\OEMCP)|,
> > but I also (on the command line via chcp) experimented with a few other
> > code pages (50227, 50229, 65001) which didn't change the behavior any.
> >
> > So, it sounds like, at least for now, I am out of luck.
> [...]
>
> Try 10646. IIRC that's the Windows codepage for Unicode (not sure if
> it's UTF-8 or UTF-16le, and the name comes from ISO/IEC 10646 which is
> also Unicode, but from the ISO point of view).

That'd be cp65001, which can be mostly made to work.

Also note that Mercurial ignores the _OEM_ code page (typically used by
console apps) and only looks at the _ANSI_ code page (typically used by
GUI apps and filesystem APIs).

--
Mathematics is the supreme nostalgia of our time.


_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can't add any non-ascii files on mercurial for Windows?

Nathan Davis
10646 seems to be an invalid code page (at least chcp doesn't recognize it)

Also, I tried changing HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\ACP from its current value of 1252 to 65001.  Doing so made the system unable to boot.

You said that cp65001, can be mostly made to work, but I have not had success with it.  Can you elaborate?


From: Matt Mackall <[hidden email]>
To: Tony Mechelynck <[hidden email]>
Cc: Nathan Davis <[hidden email]>; "[hidden email]" <[hidden email]>
Sent: Tuesday, September 25, 2012 10:19 AM
Subject: Re: Can't add any non-ascii files on mercurial for Windows?

On Tue, 2012-09-25 at 16:11 +0200, Tony Mechelynck wrote:

> On 25/09/12 04:09, Nathan Davis wrote:
> > Thanks for your reply.
> >
> > My default Codepage is 437
> > (|HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\OEMCP)|,
> > but I also (on the command line via chcp) experimented with a few other
> > code pages (50227, 50229, 65001) which didn't change the behavior any.
> >
> > So, it sounds like, at least for now, I am out of luck.
> [...]
>
> Try 10646. IIRC that's the Windows codepage for Unicode (not sure if
> it's UTF-8 or UTF-16le, and the name comes from ISO/IEC 10646 which is
> also Unicode, but from the ISO point of view).

That'd be cp65001, which can be mostly made to work.

Also note that Mercurial ignores the _OEM_ code page (typically used by
console apps) and only looks at the _ANSI_ code page (typically used by
GUI apps and filesystem APIs).

--
Mathematics is the supreme nostalgia of our time.





_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can't add any non-ascii files on mercurial for Windows?

Matt Mackall
On Tue, 2012-09-25 at 19:09 -0700, Nathan Davis wrote:
> 10646 seems to be an invalid code page (at least chcp doesn't
> recognize it)

Yeah, Microsoft has their own numbering which has no relation to ISO.

> Also, I tried changing HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet
> \Control\Nls\CodePage\ACP from its current value of 1252 to 65001.
> Doing so made the system unable to boot.

Heh. Yes, Windows' internal UTF-8 handling is.. thin.

> You said that cp65001, can be mostly made to work, but I have not had
> success with it.  Can you elaborate?

It's non-trivial and unsupported. You need to set the OEM code page,
then use the magic SetFileAPIsToOEM() kernel call. And then you'll have
a handful of interesting font/console/wincrt bugs to work around. If you
want to become a Windows character encoding guru.. this is one good
starting point.

--
Mathematics is the supreme nostalgia of our time.


_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Loading...