Quantcast

Line endings and whitespace issues on Windows

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Line endings and whitespace issues on Windows

Ben Sizer-3

Hello all,

 

I’ve been using Mercurial on Windows via TortoiseHg on a large repository converted from CVS and for the most part it works well. We have a central repo hosted on Linux which we push to over ssh and we develop on Windows. However, there are persistent problems with line endings, and possibly whitespace generally, which myself and my colleagues don’t seem to readily be able to resolve.

 

Firstly, we were finding that Visual Studio kept telling us that files had inconsistent line endings. This never happened before with CVS so it would appear to be something Mercurial was doing to the files, since nothing else has really changed. We still edit exclusively in Visual Studio, with the only other tool performing modifications being Mercurial/TortoiseHg. We couldn’t find why this occurred so we installed the hgeol extension, and set it to make all our source files convert to ‘native’ format, in the hope that this might help. Unfortunately this doesn’t seem to have fixed the problem as we are still getting the inconsistency message from Visual Studio on occasion, and occasionally being prompted to check in files with whitespace changes to each line in the file, indicating the line ending has been altered. (As you can imagine, this tends to muddy the change log and makes annotation difficult.)

 

Secondly, change detection seems not to work the way I would expect, and I can’t help but feel that this is related to whitespace or line endings too. One of our source generation tools regenerated a file with identical content, but it shows up in TortoiseHg as having been altered. A check with the command line tool shows that the file is indeed there as modified in “hg status”. When I try “hg diff” on that file, there’s no change reported. (I spotted FAQ 4.9: “hg status shows changed files but hg diff doesn't!”, and tried the --git flag, but it still reports there as being no difference.) It can’t be going on the timestamp, surely?

 

So, does anybody have any idea why:

 - Mercurial appears to do something funny to the line endings

 - The eol extension doesn’t appear to be converting things consistently

 - Hg status says a file has changes but hg diff suggests that it doesn’t?

 

Any help or suggestions appreciated!

 

--

Ben Sizer

 


_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Line endings and whitespace issues on Windows

cboos
On 9/2/2010 6:37 PM, Ben Sizer wrote:

Hello all,

 

I’ve been using Mercurial on Windows via TortoiseHg on a large repository converted from CVS and for the most part it works well. We have a central repo hosted on Linux which we push to over ssh and we develop on Windows. However, there are persistent problems with line endings, and possibly whitespace generally, which myself and my colleagues don’t seem to readily be able to resolve.

 

Firstly, we were finding that Visual Studio kept telling us that files had inconsistent line endings. This never happened before with CVS so it would appear to be something Mercurial was doing to the files, since nothing else has really changed. We still edit exclusively in Visual Studio, with the only other tool performing modifications being Mercurial/TortoiseHg. We couldn’t find why this occurred so we installed the hgeol extension, and set it to make all our source files convert to ‘native’ format, in the hope that this might help. Unfortunately this doesn’t seem to have fixed the problem as we are still getting the inconsistency message from Visual Studio on occasion, and occasionally being prompted to check in files with whitespace changes to each line in the file, indicating the line ending has been altered. (As you can imagine, this tends to muddy the change log and makes annotation difficult.)

 

Secondly, change detection seems not to work the way I would expect, and I can’t help but feel that this is related to whitespace or line endings too. One of our source generation tools regenerated a file with identical content, but it shows up in TortoiseHg as having been altered. A check with the command line tool shows that the file is indeed there as modified in “hg status”. When I try “hg diff” on that file, there’s no change reported. (I spotted FAQ 4.9: “hg status shows changed files but hg diff doesn't!”, and tried the --git flag, but it still reports there as being no difference.) It can’t be going on the timestamp, surely?

 

So, does anybody have any idea why:

 - Mercurial appears to do something funny to the line endings

 - The eol extension doesn’t appear to be converting things consistently

 - Hg status says a file has changes but hg diff suggests that it doesn’t?


I've seen a similar effect, also on Windows, and it seems indeed due to the eol extension, when a file has CRLF line endings *in the repository* and is associated to 'native' in .hgeol. However, in my case 'hg diff' did show the non-existing differences (using hg 1.6). I said non-existing, as the bytes in the repository and in the file system were the same, both in 'native' format (CRLF on Windows).

Incidentally, this happened on the .hgeol file itself which somehow ended up with \r\n line endings, and contains a "** = native" pattern which matches itself. If I remove that pattern, hg status won't show a spurious change.

 

Any help or suggestions appreciated!

 


After disabling the eol extension, 'hg status' reports no change, as it should.
The funny effect is that now the dirstate caches this state, so when re-enabling the extension, 'hg status' continues to work as expected, as long as the cache holds.

I don't know if this is by design (every file should have LF endings only, in the repository) or a bug in the eol extension.

The problem, if it is considered to be one, can be reproduced using the following script, with the eol extension active, on Windows (so native == CRLF):

  $ hg init eol-issue
  $ cd eol-issue
  $ echo -e "CRLF\r" > crlf
  $ hg ci -Am "file with crlf in repos"
  adding crlf
  $ hg status
  $ echo "[patterns]" > .hgeol
  $ echo "** = native" >> .hgeol
  $ hg status
  M crlf
  ? .hgeol

But:

  $ hg diff
  diff -r 0b6d36c04da6 crlf
  --- a/crlf      Thu Sep 02 20:48:09 2010 +0200
  +++ b/crlf      Thu Sep 02 20:51:48 2010 +0200
  @@ -1,1 +1,1 @@
  -CRLF
  +CRLF

So this is maybe not exactly the same problem as reported by Ben, but it seems close enough and is annoying on its own.

-- Christian

_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Line endings and whitespace issues on Windows

Harvey Chapman-9
On Sep 2, 2010, at 3:01 PM, Christian Boos wrote:

On 9/2/2010 6:37 PM, Ben Sizer wrote:

Firstly, we were finding that Visual Studio kept telling us that files had inconsistent line endings. This never happened before with CVS so it would 



We used to have this problem, but it was Visual Studio causing the problem (also, we were using subversion).


_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Line endings and whitespace issues on Windows

Ben Sizer-3
In reply to this post by cboos
Christian Boos wrote:
 
> > Ben Sizer wrote:
> > Firstly, we were finding that Visual Studio kept telling us that files
> > had inconsistent line endings. This never happened before with CVS so it
> > would appear to be something Mercurial was doing to the files, since nothing
> > else has really changed.
[...]
> > Secondly, change detection seems not to work the way I would expect, and I can'
> > help but feel that this is related to whitespace or line endings too. One of our
> > source generation tools regenerated a file with identical content, but it shows
> > up in TortoiseHg as having been altered. A check with the command line tool shows
> > that the file is indeed there as modified in "hg status".
> > When I try "hg diff" on that file, there's no change reported.
 
> I've seen a similar effect, also on Windows, and it seems indeed due to the eol
> extension, when a file has CRLF line endings *in the repository* and is associated
> to 'native' in .hgeol.
 
That sounds reasonable, as we probably have CRLF endings stored in the central repo
(prior to the eol extension being installed) but we actually had this same problem
before we used the extension. We added it to try and fix the problem but it didn't
appear to help.
 
> However, in my case 'hg diff' did show the non-existing differences (using hg 1.6).  
 
Yeah, it's strange that in my case there was an invisible difference. I can't see a
good reason why source control would insist a file has changed but can't tell you what
change it wants to commit. So something odd is afoot.
 

> The problem, if it is considered to be one, can be reproduced using the following
> script, with the eol extension active, on Windows (so native == CRLF):
>
>  $ hg init eol-issue
>  $ cd eol-issue
>  $ echo -e "CRLF\r" > crlf
>  $ hg ci -Am "file with crlf in repos"
>  adding crlf
>  $ hg status
>  $ echo "[patterns]" > .hgeol
>  $ echo "** = native" >> .hgeol
>  $ hg status
>  M crlf
>  ? .hgeol
>
>But:
>
>  $ hg diff
>  diff -r 0b6d36c04da6 crlf
>  --- a/crlf      Thu Sep 02 20:48:09 2010 +0200
>  +++ b/crlf      Thu Sep 02 20:51:48 2010 +0200
>  @@ -1,1 +1,1 @@
>  -CRLF
>  +CRLF
 
Interesting test case. I expect my idea of what end of line handling should do may
differ from other people's, but I would have thought that setting up a system to
handle line ending conversions should actually result in fewer commits, not more of
them, ie. it becomes more tolerant of differing line endings. This seems to be the
opposite, where setting a conversion preference can actually make the system believe
a file has outstanding changing when you've not touched it. Something like this
could possibly explain some of the issues I've been seeing.
 
(PS. Apologies for any formatting issues - I'm wrestling with Outlook here which
doesn't appear to have the same definition of 'plain text' that the rest of the
world has.)
 
 
--
Ben Sizer
 

_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Line endings and whitespace issues on Windows

Ben Sizer-3
In reply to this post by Harvey Chapman-9
Harvey Chapman wrote:

> Ben Sizer wrote:
> > Firstly, we were finding that Visual Studio kept telling us that files had inconsistent line endings.

> We used to have this problem, but it was Visual Studio causing the problem (also, we were using subversion).

How could you tell it was caused by Visual Studio?

It would seem unlikely that, given several developers all using Visual Studio on Windows, that it would be somehow inserting anything other than \r\n endings. Indeed when we used CVS, there were never any such warnings. We checked in consistent files and got consistent files out. Even when CVS had to do merges, the line endings would be fine. Obviously it's impossible to rule out the possibility that somehow we never saw the inconsistency and that CVS fixed it all up for us in flight, but it seems more likely that  Mercurial is getting something wrong at the merge stage, possibly to do with applying Windows-format changesets on a Linux system when we push, I don't know.

--
Ben Sizer
_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Line endings and whitespace issues on Windows

Martin Geisler
In reply to this post by Ben Sizer-3
Ben Sizer <[hidden email]> writes:

> Christian Boos wrote:
>
>> The problem, if it is considered to be one, can be reproduced using
>> the following script, with the eol extension active, on Windows (so
>> native == CRLF):
>>
>>  $ hg init eol-issue
>>  $ cd eol-issue
>>  $ echo -e "CRLF\r" > crlf
>>  $ hg ci -Am "file with crlf in repos"
>>  adding crlf
>>  $ hg status
>>  $ echo "[patterns]" > .hgeol
>>  $ echo "** = native" >> .hgeol
>>  $ hg status
>>  M crlf
>>  ? .hgeol
>>
>> But:
>>
>>  $ hg diff
>>  diff -r 0b6d36c04da6 crlf
>>  --- a/crlf      Thu Sep 02 20:48:09 2010 +0200
>>  +++ b/crlf      Thu Sep 02 20:51:48 2010 +0200
>>  @@ -1,1 +1,1 @@
>>  -CRLF
>>  +CRLF
Is this diff now just showing you that the crlf file will be modified by
the next commit? I know you cannot really see the change in line endings
but I think that is more a problem with the diff format.

When you say '** = native', you are asking for files to have native line
endings in the working copy and *LF* line endings in the repository. Use

  [repository]
  native = CRLF

in the .hgeol file if you want to override what the repository-native
line endings should be.

> Interesting test case. I expect my idea of what end of line handling
> should do may differ from other people's, but I would have thought
> that setting up a system to handle line ending conversions should
> actually result in fewer commits, not more of them, ie. it becomes
> more tolerant of differing line endings. This seems to be the
> opposite, where setting a conversion preference can actually make the
> system believe a file has outstanding changing when you've not touched
> it. Something like this could possibly explain some of the issues I've
> been seeing.

The eol extension will *normalize* the line endings stored in the
repository, so yes, it can certainly be the case that a file now has
outstanding changes when you enable the eol extension.

It would be nice if the extension would take note of the existing line
endings and automatically convert back and forth between them -- see
this issue which was opened just yesterday:

  http://mercurial.selenic.com/bts/issue2355

--
Martin Geisler

Mercurial links: http://mercurial.ch/

_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial

attachment0 (203 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Line endings and whitespace issues on Windows

Ben Sizer-3
Martin Geisler wrote:

> The eol extension will *normalize* the line endings stored in the repository,
> so yes, it can certainly be the case that a file now has outstanding changes
> when you enable the eol extension.

Ok. I appreciate that once you enable this option, a conversion needs to take place. However I would suggest that ideally (a) the conversion shouldn't occur until you have a changeset to commit involving that file, and (b) the conversion should not form part of a changeset itself, but rather be something that occurs after the change has been applied to the data and before it hits the disk in the repository. After all, what I want from line-ending handling is for it to be handled transparently, not for there to be changesets with nothing but line-ending alterations there.

Perhaps this was not your use case when developing this extension? Do you have any advice on achieving this or working around it? Or indeed, any insight into what exactly Mercurial is doing with the line-endings? I know the eol extension is not the root cause of our issues as we had the problem before we used it. But it doesn't appear to solve them in a way that we can use either. :)


--
Ben Sizer
_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Line endings and whitespace issues on Windows

Martin Geisler
Ben Sizer <[hidden email]> writes:

> Martin Geisler wrote:
>
>> The eol extension will *normalize* the line endings stored in the
>> repository, so yes, it can certainly be the case that a file now has
>> outstanding changes when you enable the eol extension.
>
> Ok. I appreciate that once you enable this option, a conversion needs
> to take place. However I would suggest that ideally (a) the conversion
> shouldn't occur until you have a changeset to commit involving that
> file, and (b) the conversion should not form part of a changeset
> itself, but rather be something that occurs after the change has been
> applied to the data and before it hits the disk in the repository.
You cannot have a change that is not part of a changeset -- we have no
"room" to store such a change.

> After all, what I want from line-ending handling is for it to be
> handled transparently, not for there to be changesets with nothing but
> line-ending alterations there.

If you have file with the "wrong" line endings in the repository, then
there will be one such changeset after you enable the extension. That
should be all.

> Perhaps this was not your use case when developing this extension?

The use case is projects involving people on different platforms where
everybody wants to have native line endings.

> Do you have any advice on achieving this or working around it? Or
> indeed, any insight into what exactly Mercurial is doing with the
> line-endings? I know the eol extension is not the root cause of our
> issues as we had the problem before we used it. But it doesn't appear
> to solve them in a way that we can use either. :)

Well, standard Mercurial wont touch your line endings at all. Mercurial
treats all files as binary and so it gives you back the bytes you gave
it originally.

Perhaps it would be simpler for you to just not use the eol extension
and commit the files with CRLF line endings in the repository? After
all, it's no crime to have CRLF files in a repository :)

--
Martin Geisler

Mercurial links: http://mercurial.ch/

_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial

attachment0 (203 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Line endings and whitespace issues on Windows

Andreas Tscharner
In reply to this post by Ben Sizer-3
On 03.09.2010 11:53, Ben Sizer wrote:

[snip]
> other than \r\n endings. Indeed when we used CVS, there were never
> any such warnings. We checked in consistent files and got consistent
> files out. Even when CVS had to do merges, the line endings would be
> fine. Obviously it's impossible to rule out the possibility that
> somehow we never saw the inconsistency and that CVS fixed it all up
> for us in flight, but it seems more likely that  Mercurial is getting

Depends on what CVS you had been using. CVSNT was very good at that
point. Internally, they used the Linux line ending, but on Windows (or
even Mac IIRC), they changed them while checking out or updating...

Best regards
        Andreas
--
       ("`-''-/").___..--''"`-._
        `o_ o  )   `-.  (     ).`-.__.`)
        (_Y_.)'  ._   )  `._ `. ``-..-'
      _..`--'_..-_/  /--'_.' .'
     (il).-''  (li).'  ((!.-'

Andreas Tscharner   [hidden email]   ICQ-No. 14356454
_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Line endings and whitespace issues on Windows

Ben Sizer-3
In reply to this post by Martin Geisler
Martin Geisler wrote:

> Ben Sizer wrote:
> Ok. I appreciate that once you enable this option, a conversion needs
> to take place. However I would suggest that ideally (a) the conversion
> shouldn't occur until you have a changeset to commit involving that
> file, and (b) the conversion should not form part of a changeset
> itself, but rather be something that occurs after the change has been
> applied to the data and before it hits the disk in the repository.

> You cannot have a change that is not part of a changeset -- we have no
> "room" to store such a change.

Yeah, that makes sense. In that case, it seems like this is the sort of change that I'd like to happen automatically after an update and before a commit, so that there is no change to what is in the repository, just to what happens to your working copy. At least, that's the use case that I would want.

> > After all, what I want from line-ending handling is for it to be
> > handled transparently, not for there to be changesets with nothing but
> > line-ending alterations there.

> If you have file with the "wrong" line endings in the repository, then there will be one such changeset after you enable the extension. That should be all.

Unfortunately just one such changeset is enough to ruin hg annotate, polluting it with a visible change to every line that carries little useful information.

Additionally, we're finding there are lots and lots of individual changesets for line ending changes, as we gradually work through various files. This makes tracking the important changes somewhat less convenient.

(This isn't specific to the eol extension. It's just a problem we're finding with the line ending issues we have.)

> > Do you have any advice on achieving this or working around it? Or
> > indeed, any insight into what exactly Mercurial is doing with the
> > line-endings? I know the eol extension is not the root cause of our
> > issues as we had the problem before we used it. But it doesn't appear
> > to solve them in a way that we can use either. :)

> Well, standard Mercurial wont touch your line endings at all. Mercurial
> treats all files as binary and so it gives you back the bytes you gave it
> originally.

I'm having trouble seeing how that is true in practice, since we are pushing Windows format files to a Linux repository, and when others pull them back to Windows, Visual Studio claims the line endings are inconsistent. Now, I certainly won't claim that Visual Studio is perfect, but this is a problem that only appeared when we migrated to Mercurial from CVS.

Perhaps there is something else messing things up, like Kdiff3. Nothing else has changed in our tool chain except we converted the repository and moved from WinCVS to TortoiseHg for individual developers.

> Perhaps it would be simpler for you to just not use the eol extension and
> commit the files with CRLF line endings in the repository? After all, it's
> no crime to have CRLF files in a repository :)

That's exactly what we were doing, but somehow the line endings were (and still are) coming back inconsistent. Harvey Chapman wrote yesterday that he felt that Visual Studio was the cause of the problem, and maybe he's right, but it's hard to see how that would be the case given that there were no such problems with CVS and that they appeared as soon as we moved to Mercurial/TortoiseHg with no other changes.

--
Ben Sizer
_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Line endings and whitespace issues on Windows

Ben Sizer-3
In reply to this post by Andreas Tscharner
Andreas Tscharner wrote:

> Ben Sizer wrote:
> > Even when CVS had to do merges, the line endings would be fine.
> > Obviously it's impossible to rule out the possibility that somehow we
> > never saw the inconsistency and that CVS fixed it all up for us in
> > flight, but it seems more likely that  Mercurial is getting

> Depends on what CVS you had been using. CVSNT was very good at that
> point. Internally, they used the Linux line ending, but on Windows
> (or even Mac IIRC), they changed them while checking out or updating...

That definitely makes some sort of sense. That's the sort of behaviour I'd like to see from Mercurial, doing that sort of change at that stage, if you explicitly configure it to do so of course.

It doesn't explain how the inconsistent line endings are arriving in the first place however. I suppose I need some way of diagnosing exactly what state each version of the file is in: working copy, local repo copy, and remote repo copy.

--
Ben Sizer
_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Line endings and whitespace issues on Windows

cboos
In reply to this post by Martin Geisler
  On 9/3/2010 11:57 AM, Martin Geisler wrote:

> Ben Sizer<[hidden email]>  writes:
>
>> Christian Boos wrote:
>>
>>> The problem, if it is considered to be one, can be reproduced using
>>> the following script, with the eol extension active, on Windows (so
>>> native == CRLF):
>>>
>>>   $ hg init eol-issue
>>>   $ cd eol-issue
>>>   $ echo -e "CRLF\r">  crlf
>>>   $ hg ci -Am "file with crlf in repos"
>>>   adding crlf
>>>   $ hg status
>>>   $ echo "[patterns]">  .hgeol
>>>   $ echo "** = native">>  .hgeol
>>>   $ hg status
>>>   M crlf
>>>   ? .hgeol
>>>
>>> But:
>>>
>>>   $ hg diff
>>>   diff -r 0b6d36c04da6 crlf
>>>   --- a/crlf      Thu Sep 02 20:48:09 2010 +0200
>>>   +++ b/crlf      Thu Sep 02 20:51:48 2010 +0200
>>>   @@ -1,1 +1,1 @@
>>>   -CRLF
>>>   +CRLF
> Is this diff now just showing you that the crlf file will be modified by
> the next commit? I know you cannot really see the change in line endings
> but I think that is more a problem with the diff format.

Ah yes, that's right, if I do 'hg ci', there will indeed be a commit
created, which is also not completely intuitive as both the repository
and the working directory had exactly the same byte content (CRLF).
But after the commit, the repository content has been "normalized" to
LF. Good!

> When you say '** = native', you are asking for files to have native line
> endings in the working copy and *LF* line endings in the repository.

Ok, that's what I wanted. More precisely, I wanted to have native line
endings in the working copy and actually didn't care what's in the
repository, as I didn't expect it would make a difference. But, my
fault, I overlooked the [repository] section of .hgeol and indeed the
extension help states that the default storage for "native" files is
"LF". I guess I expected the "auto" value suggested in issue2355 to be
the default...

> The eol extension will *normalize* the line endings stored in the
> repository, so yes, it can certainly be the case that a file now has
> outstanding changes when you enable the eol extension.
>
> It would be nice if the extension would take note of the existing line
> endings and automatically convert back and forth between them -- see
> this issue which was opened just yesterday:
>
>    http://mercurial.selenic.com/bts/issue2355

Thanks for the detailed reply, and a +1 for the "auto" feature which
would make an useful default (assuming mixed line endings are not
allowed, detecting the repository format on the fly shouldn't be a problem).

-- Christian

_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Line endings and whitespace issues on Windows

Martin Geisler
In reply to this post by Ben Sizer-3
Ben Sizer <[hidden email]> writes:

> Martin Geisler wrote:
>
>> You cannot have a change that is not part of a changeset -- we have
>> no "room" to store such a change.
>
> Yeah, that makes sense. In that case, it seems like this is the sort
> of change that I'd like to happen automatically after an update and
> before a commit, so that there is no change to what is in the
> repository, just to what happens to your working copy. At least,
> that's the use case that I would want.
That is actually also what the eol extension does: it installs a set of
filters that are applied on all bytes read and written to the working
copy.

So on Windows, when you read LF bytes from the repository (the actual
history) it will write CRLF bytes to the working copy. And when you
commit your CRLF bytes from the working copy, they are filtered back to
LF bytes in the repository.

>> If you have file with the "wrong" line endings in the repository,
>> then there will be one such changeset after you enable the extension.
>> That should be all.
>
> Unfortunately just one such changeset is enough to ruin hg annotate,
> polluting it with a visible change to every line that carries little
> useful information.

I don't think it will ruin anything -- use annotate in TortoiseHg and
when you see line ending change, just right-click on the line and choose
'Annotate Parent'. That way it is super easy to "peel off" each change
until you reach the one you are looking for.

> Additionally, we're finding there are lots and lots of individual
> changesets for line ending changes, as we gradually work through
> various files. This makes tracking the important changes somewhat less
> convenient.

It should be just one change. If you in a clean working copy do

  hg update null
  hg update

then you should ensure that the filters are run on all files and so all
relevant changes can be committed in one changeset.

The problem I worry about is that the eol extension does indeed work on
a file-by-file basis, so if you enable it while you have lots of files
checked out, then they wont be converted.

>> Well, standard Mercurial wont touch your line endings at all.
>> Mercurial treats all files as binary and so it gives you back the
>> bytes you gave it originally.
>
> I'm having trouble seeing how that is true in practice, since we are
> pushing Windows format files to a Linux repository, and when others
> pull them back to Windows, Visual Studio claims the line endings are
> inconsistent.

Please check what extensions you have enabled and please try to
reproduce with with a small file for which you can make a hexdump.

Mercurial is really no touching the bytes by default -- you have to go
out of your way to make it do that by enabling extensions such as the
eol or the keyword extensions.

>> Perhaps it would be simpler for you to just not use the eol extension
>> and commit the files with CRLF line endings in the repository? After
>> all, it's no crime to have CRLF files in a repository :)
>
> That's exactly what we were doing, but somehow the line endings were
> (and still are) coming back inconsistent. Harvey Chapman wrote
> yesterday that he felt that Visual Studio was the cause of the
> problem, and maybe he's right, but it's hard to see how that would be
> the case given that there were no such problems with CVS and that they
> appeared as soon as we moved to Mercurial/TortoiseHg with no other
> changes.
Strange... I think older versions of TortoiseHg came with the win32text
extension enabled by default. However, the eol extension aborts if it
sees that win32text is loaded so I don't think that is your problem.

--
Martin Geisler

Mercurial links: http://mercurial.ch/

_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial

attachment0 (203 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Line endings and whitespace issues on Windows

Harvey Chapman-9
In reply to this post by Ben Sizer-3
On Sep 3, 2010, at 5:53 AM, Ben Sizer wrote:

> Harvey Chapman wrote:
>
>> Ben Sizer wrote:
>>> Firstly, we were finding that Visual Studio kept telling us that files had inconsistent line endings.
>
>> We used to have this problem, but it was Visual Studio causing the problem (also, we were using subversion).
>
> How could you tell it was caused by Visual Studio?

I'm working from foggy memory here. It wasn't consistent. I think it happened when we copy/pasted code from a webpage or some other CRLF source into a LF-only file in a VS editor window. I think VS would preserve the CRs. Actually, this kind of makes sense to me now since we used to copy/paste unicode characters into source files for testing as well because that was the easiest way to throw some foreign characters into our test code. In other words, I'd be upset if VS modified a "CRLF" that is actually part of some foreign characters. Also, we often use VS in VMWare machines and we do copy/paste from one operating system to another. I could be completely wrong about this, but I think that was the problem.

Even if it wasn't VS, it wasn't mercurial in our case because we were using SVN and had never even heard of Hg (oh, how I wish we had).

_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Line endings and whitespace issues on Windows

Ben Sizer-3
In reply to this post by Martin Geisler
Martin Geisler wrote:

>Ben Sizer wrote:
>>
>> Yeah, that makes sense. In that case, it seems like this is the sort
>> of change that I'd like to happen automatically after an update and
>> before a commit, so that there is no change to what is in the
>> repository, just to what happens to your working copy. At least,
>> that's the use case that I would want.

> That is actually also what the eol extension does: it installs a set
> of filters that are applied on all bytes read and written to the
> working copy.

Ideally, that would be enough for us. But in practice it's not worked out. We don't really care how the data is stored, just as long as the endings stay consistent, but for some reason they're not.


>>> If you have file with the "wrong" line endings in the repository,
>>> then there will be one such changeset after you enable the extension.
>>> That should be all.
>>
>> Unfortunately just one such changeset is enough to ruin hg annotate,
>> polluting it with a visible change to every line that carries little
>> useful information.
>
> I don't think it will ruin anything -- use annotate in TortoiseHg and
> when you see line ending change, just right-click on the line and choose
> 'Annotate Parent'. That way it is >super easy to "peel off" each change
> until you reach the one you are looking for.

Ah, thanks for the tip. That definitely makes it a bit more manageable.

>> Additionally, we're finding there are lots and lots of individual
>> changesets for line ending changes, as we gradually work through
>> various files. This makes tracking the important changes somewhat less
>> convenient.
>
>It should be just one change. If you in a clean working copy do
>
>  hg update null
>  hg update
>
>then you should ensure that the filters are run on all files and so all
>relevant changes can be committed in one changeset.

That's not the experience we had, unfortunately. We had clean repositories after enabling the eol extension, and then the next commit did indeed change a large number of files in the way you are describing, typically replacing the whole file. But it didn't convert every file, and some of the remaining ones have given us problems since then.

>>> Well, standard Mercurial wont touch your line endings at all.
>>> Mercurial treats all files as binary and so it gives you back the
>>> bytes you gave it originally.
>>
>> I'm having trouble seeing how that is true in practice, since we are
>> pushing Windows format files to a Linux repository, and when others
>> pull them back to Windows, Visual Studio claims the line endings are
>> inconsistent.
>
>Please check what extensions you have enabled and please try to
>reproduce with with a small file for which you can make a hexdump.

Locally, our copies of TortoiseHg just have eol enabled, with our source files set to 'native' locally. Remotely, our central push repo on a virtual Linux box appears to not have any extensions enabled at all. I wonder if this asymmetry is part of the problem.

Unfortunately we really don't know what causes this to happen so deliberately producing a test case is going to be difficult. But when it next arises, I'll try and get a copy of the file.

--
Ben Sizer
_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Line endings and whitespace issues on Windows

Steve Borho
In reply to this post by Martin Geisler
On Fri, Sep 3, 2010 at 7:35 AM, Martin Geisler <[hidden email]> wrote:

> Ben Sizer <[hidden email]> writes:
>
>> Martin Geisler wrote:
>>
>>> You cannot have a change that is not part of a changeset -- we have
>>> no "room" to store such a change.
>>
>> Yeah, that makes sense. In that case, it seems like this is the sort
>> of change that I'd like to happen automatically after an update and
>> before a commit, so that there is no change to what is in the
>> repository, just to what happens to your working copy. At least,
>> that's the use case that I would want.
>
> That is actually also what the eol extension does: it installs a set of
> filters that are applied on all bytes read and written to the working
> copy.
>
> So on Windows, when you read LF bytes from the repository (the actual
> history) it will write CRLF bytes to the working copy. And when you
> commit your CRLF bytes from the working copy, they are filtered back to
> LF bytes in the repository.
>
>>> If you have file with the "wrong" line endings in the repository,
>>> then there will be one such changeset after you enable the extension.
>>> That should be all.
>>
>> Unfortunately just one such changeset is enough to ruin hg annotate,
>> polluting it with a visible change to every line that carries little
>> useful information.
>
> I don't think it will ruin anything -- use annotate in TortoiseHg and
> when you see line ending change, just right-click on the line and choose
> 'Annotate Parent'. That way it is super easy to "peel off" each change
> until you reach the one you are looking for.
>
>> Additionally, we're finding there are lots and lots of individual
>> changesets for line ending changes, as we gradually work through
>> various files. This makes tracking the important changes somewhat less
>> convenient.
>
> It should be just one change. If you in a clean working copy do
>
>  hg update null
>  hg update
>
> then you should ensure that the filters are run on all files and so all
> relevant changes can be committed in one changeset.
>
> The problem I worry about is that the eol extension does indeed work on
> a file-by-file basis, so if you enable it while you have lots of files
> checked out, then they wont be converted.
>
>>> Well, standard Mercurial wont touch your line endings at all.
>>> Mercurial treats all files as binary and so it gives you back the
>>> bytes you gave it originally.
>>
>> I'm having trouble seeing how that is true in practice, since we are
>> pushing Windows format files to a Linux repository, and when others
>> pull them back to Windows, Visual Studio claims the line endings are
>> inconsistent.
>
> Please check what extensions you have enabled and please try to
> reproduce with with a small file for which you can make a hexdump.
>
> Mercurial is really no touching the bytes by default -- you have to go
> out of your way to make it do that by enabling extensions such as the
> eol or the keyword extensions.
>
>>> Perhaps it would be simpler for you to just not use the eol extension
>>> and commit the files with CRLF line endings in the repository? After
>>> all, it's no crime to have CRLF files in a repository :)
>>
>> That's exactly what we were doing, but somehow the line endings were
>> (and still are) coming back inconsistent. Harvey Chapman wrote
>> yesterday that he felt that Visual Studio was the cause of the
>> problem, and maybe he's right, but it's hard to see how that would be
>> the case given that there were no such problems with CVS and that they
>> appeared as soon as we moved to Mercurial/TortoiseHg with no other
>> changes.
>
> Strange... I think older versions of TortoiseHg came with the win32text
> extension enabled by default. However, the eol extension aborts if it
> sees that win32text is loaded so I don't think that is your problem.

I can confirm older versions did load the win32text extension in the
site-wide Mercurial.ini file, but none of the conversion hooks were
enabled so it was mostly harmless.  We quit enabling win32text at all
a long time ago.  The first version of Mercurial.ini checked into our
repo in the 0.7 time frame (17 months ago) already had it disabled.

--
Steve Borho
_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Line endings and whitespace issues on Windows

Martin Geisler
In reply to this post by Ben Sizer-3
Ben Sizer <[hidden email]> writes:

> Martin Geisler wrote:
>
>> Please check what extensions you have enabled and please try to
>> reproduce with with a small file for which you can make a hexdump.
>
> Locally, our copies of TortoiseHg just have eol enabled, with our
> source files set to 'native' locally. Remotely, our central push repo
> on a virtual Linux box appears to not have any extensions enabled at
> all. I wonder if this asymmetry is part of the problem.

No, extensions on the server is not the problem. All changesets in
Mercurial contain a hash value which defines their "identity" and that
would change drastically if even a single bit is flipped by the server.

So if you see a changeset with hash value 41c42b69055b on the server and
you pull that into your own clone, then you can be certain that they are
identical, bit for bit.

> Unfortunately we really don't know what causes this to happen so
> deliberately producing a test case is going to be difficult. But when
> it next arises, I'll try and get a copy of the file.

Okay. Until then, please disable the eol extension too and just commit
the files with CRLF into the repository. That seems like the most
sensible way to manage your project since I don't think you need to
convert the files the LF files at all.

--
Martin Geisler

Mercurial links: http://mercurial.ch/

_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial

attachment0 (203 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Line endings and whitespace issues on Windows

Martin Geisler
In reply to this post by Steve Borho
Steve Borho <[hidden email]> writes:

> On Fri, Sep 3, 2010 at 7:35 AM, Martin Geisler <[hidden email]> wrote:
>> Ben Sizer <[hidden email]> writes:
>>
>>> That's exactly what we were doing, but somehow the line endings were
>>> (and still are) coming back inconsistent. Harvey Chapman wrote
>>> yesterday that he felt that Visual Studio was the cause of the
>>> problem, and maybe he's right, but it's hard to see how that would
>>> be the case given that there were no such problems with CVS and that
>>> they appeared as soon as we moved to Mercurial/TortoiseHg with no
>>> other changes.
>>
>> Strange... I think older versions of TortoiseHg came with the win32text
>> extension enabled by default. However, the eol extension aborts if it
>> sees that win32text is loaded so I don't think that is your problem.
>
> I can confirm older versions did load the win32text extension in the
> site-wide Mercurial.ini file, but none of the conversion hooks were
> enabled so it was mostly harmless. We quit enabling win32text at all a
> long time ago. The first version of Mercurial.ini checked into our
> repo in the 0.7 time frame (17 months ago) already had it disabled.
Yeah, sorry, that idea did not make much sense now that I think about it
again.

--
Martin Geisler

Mercurial links: http://mercurial.ch/

_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial

attachment0 (203 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Line endings and whitespace issues on Windows

Mike Meyer-2
In reply to this post by Harvey Chapman-9
On Fri, 3 Sep 2010 10:23:41 -0400
Harvey Chapman <[hidden email]> wrote:

> On Sep 3, 2010, at 5:53 AM, Ben Sizer wrote:
>
> > Harvey Chapman wrote:
> >
> >> Ben Sizer wrote:
> >>> Firstly, we were finding that Visual Studio kept telling us that files had inconsistent line endings.
> >
> >> We used to have this problem, but it was Visual Studio causing the problem (also, we were using subversion).
> >
> > How could you tell it was caused by Visual Studio?
>
> I'm working from foggy memory here. It wasn't consistent.

I know from personal experience you can get inconsistent line endings
from subversion without Visual Studio being part of the
equation. Subversion has a per-user setting that does what the eol
extension does for hg. Which just made things worse unless everyone
used it, as it created three types of users: two types who added their
native line endings when adding/editing text, and one type that
converted all the line endings to the canonical form whenever they
checked in a file.

    <mike
--
Mike Meyer <[hidden email]> http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Line endings and whitespace issues on Windows

Ben Sizer-3
In reply to this post by cboos
>From: Christian Boos [mailto:[hidden email]]
>Sent: 03 September 2010 12:46
>  On 9/3/2010 11:57 AM, Martin Geisler wrote:
>> Ben Sizer<[hidden email]>  writes:
>>
>>> Christian Boos wrote:
>>>
>>>> The problem, if it is considered to be one, can be reproduced using
>>>> the following script, with the eol extension active, on Windows (so
>>>> native == CRLF):

[...example snipped...]

>> Is this diff now just showing you that the crlf file will be modified
>> by the next commit? I know you cannot really see the change in line
>> endings but I think that is more a problem with the diff format.
>
>Ah yes, that's right, if I do 'hg ci', there will indeed be a commit created,
>which is also not completely intuitive as both the repository and the working
>directory had exactly the same byte content (CRLF).
>But after the commit, the repository content has been "normalized" to LF. Good!

The strange thing is, this is not the behaviour I see. Or at least, not the way I understand it.

My .hgeol contains this line:
**.cpp = native

Native is Windows in this case. We don't explicitly specify a repository format.

I have a file in my local repository that is in Unix format, for whatever reason. (Loads of them in fact, which I expect is the underyling issue for my problems.)

Mercurial - with eol enabled - does not believe this file has any amendments waiting to check in - is this because, since both the file type and the repository are both 'native', it assumes no conversion needs ever take place?

When I introduce an arbitrary modification to the file, the diff looks like normal - only the specific modification is being noted. It's making no attempt to convert the other lines. My non-native file is apparently going to be committed in non-native format.

Yet I have seen examples like the one Christian posted - where a file appears to have changes that you need to check in, purely because the eol extension spotted that the line endings weren't what it expected. I don't understand why it happens some times and not others.

In fact, if I shelve the change, it still shows up as modified - presumably because it has the hidden line changes to commit. If I do this with eol switched off, the file is unmodified, as expected.

(Note that it wouldn't do that otherwise, which is what I noted in an earlier email when I pointed out that you do not actually get a single commit changing all your line endings when you switch eol on, as was suggested - you appear to get individual ones when you make commits for other reasons.)

Once I committed this, the changeset contained no files, and no patch. This phantom changeset is very bizarre.


>> When you say '** = native', you are asking for files to have native
>> line endings in the working copy and *LF* line endings in the repository.
>
>> The eol extension will *normalize* the line endings stored in the
>> repository, so yes, it can certainly be the case that a file now has
>> outstanding changes when you enable the eol extension.

And this is what I find strange - normalisation, to me, implies converting to a known quantity. But in this case, it's as if it won't attempt to do so, because it's already 'outside' the normal conversion path. I would hope that an extension could make an attempt to convert any file with consistent endings to the 'normal' type. (And I would hope that it does this transparently, although it seems from previous answers that sometimes it does this, sometimes it needs an explicit change, and I don't quite understand which is which yet.)

--
Ben Sizer


_______________________________________________
Mercurial mailing list
[hidden email]
http://selenic.com/mailman/listinfo/mercurial
12
Loading...