Understanding and modifying the default diff for commits

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Understanding and modifying the default diff for commits

Andres Sommerhoff
Hi all, I want to intervene the diff operation used by the mercurial commit. I want to collect only the meaningful changes a heavy directory tree full of XML files (this to make easier to audit what really has changed, but also saving some disk space by doing so doesn’t hurt). I was looking in internet and some mercurial add that could help, put I was unsuccessfully, so thinking to make my own extension (or maybe some scripting in pre-commit hook).

I will appreciate any advice where to start my intervention of the diff process during the commit of mercurial if I go to make my own extension? Any help for locating the diff code that is used by mercurial (to look and learn how is the interaction with it)?

If you are curious about the problem I’m trying to deal with it, it is the software KNIME that the projects (scientific models) developed in that software is saved in several XML files, where each XML represent a small portion of the model (“nodes” as called in KNIME). One project can easily have more than 500 nodes (-> XML files). If I change a single node and save the project, then not only the single related file is changes but all the 500 XML files are also updated. Inside each XML file the “last modification date” and “last author” is changed.

I’m looking to skip all the files that the single change was updating “last modification date” and “last author” but nothing else. By doing so, I can focus in the important changes, making easy to audit the manful modifications, merges can be far less cumbersome, and the history much cleaner when making a log on a specific file.

Maybe a simple command line option for the commit is the solution, but I see no official option in the commit command in order to use an alternative diff tool for calculating the patches. On the other hand, as far I read, the “extdiff” option only affect the comparison of the revisions, but not for commit process (maybe I’m wrong on this last sentence). Or maybe a commit command line using the function “diff([includepattern [, excludepattern]]):” in conjunction with “--exclude “ will make the magic I’m looking for, but I couldn’t figure it out yet.

I’m on Windows 10, using Mercurial and TortoiseHg 5.0.2.

Regards, Andres  

_______________________________________________
Mercurial mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|

Re: Understanding and modifying the default diff for commits

Pierre-Yves David-2
Your use case is a bit unclear to me. I'll ask some silly questions to
try to clarify that.

As far as I understand, you are using KNIME to efit a "project" KNIME
itself generate a bunch of XML file. Whenever anything happens, KNIME
rewrite all files to update the author and last update file, regardless
of them being affected by the update or not. Right ?

I assume these xml file contains actual data, right ? They are not
generate by product you could exclude, right ?

The simplest option would probably to run a small script that revert
these before commit, or that simply delete these field as another user
suggested. You could configure it as a pre-status and pre-commit hook.

An interesting feature Mercurial has, is fileset `hg help fileset`. It
would probably be simple to create a fileset that detect your case and
use it in situation where it helps.

The best solution for your would probably to teach KNIME to not do this,
but I guess you have tried that already.

Cheers,

On 6/8/20 7:32 PM, Andres Sommerhoff wrote:

> Hi all, I want to intervene the diff operation used by the mercurial
> commit. I want to collect only the meaningful changes a heavy directory
> tree full of XML files (this to make easier to audit what really has
> changed, but also saving some disk space by doing so doesn’t hurt). I
> was looking in internet and some mercurial add that could help, put I
> was unsuccessfully, so thinking to make my own extension (or maybe some
> scripting in pre-commit hook).
>
> I will appreciate any advice where to start my intervention of the diff
> process during the commit of mercurial if I go to make my own extension?
> Any help for locating the diff code that is used by mercurial (to look
> and learn how is the interaction with it)?
>
> If you are curious about the problem I’m trying to deal with it, it is
> the software KNIME that the projects (scientific models) developed in
> that software is saved in several XML files, where each XML represent a
> small portion of the model (“nodes” as called in KNIME). One project can
> easily have more than 500 nodes (-> XML files). If I change a single
> node and save the project, then not only the single related file is
> changes but all the 500 XML files are also updated. Inside each XML file
> the “last modification date” and “last author” is changed.
>
> I’m looking to skip all the files that the single change was updating
> “last modification date” and “last author” but nothing else. By doing
> so, I can focus in the important changes, making easy to audit the
> manful modifications, merges can be far less cumbersome, and the history
> much cleaner when making a log on a specific file.
>
> Maybe a simple command line option for the commit is the solution, but I
> see no official option in the commit command in order to use an
> alternative diff tool for calculating the patches. On the other hand, as
> far I read, the “extdiff” option only affect the comparison of the
> revisions, but not for commit process (maybe I’m wrong on this last
> sentence). Or maybe a commit command line using the function
> “diff([includepattern [, excludepattern]]):” in conjunction with
> “--exclude “ will make the magic I’m looking for, but I couldn’t figure
> it out yet.
>
> I’m on Windows 10, using Mercurial and TortoiseHg 5.0.2.
>
> Regards, Andres
>
> _______________________________________________
> Mercurial mailing list
> [hidden email]
> https://www.mercurial-scm.org/mailman/listinfo/mercurial
>

--
Pierre-Yves David
_______________________________________________
Mercurial mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial
Reply | Threaded
Open this post in threaded view
|

Re: Understanding and modifying the default diff for commits

Andres Sommerhoff
Thank you Pierre-Yves, my comments in red below.

Regards, Andres

On Wed, Jun 10, 2020 at 6:04 AM Pierre-Yves David <[hidden email]> wrote:
Your use case is a bit unclear to me. I'll ask some silly questions to
try to clarify that.

As far as I understand, you are using KNIME to efit a "project" KNIME
itself generate a bunch of XML file. Whenever anything happens, KNIME
rewrite all files to update the author and last update file, regardless
of them being affected by the update or not. Right ?
Yes, that is right.

I assume these xml file contains actual data, right ? They are not
generate by product you could exclude, right ?
The XML contains information on how to deal with scientific data (information like mathematical formulas, conditions, filters, code, etc). I could also save the model including the scientific data in place, but if I'm going to commit to Mercurial, I reset the model in order to leave only the structure and formulas of the model without any further data. 


The simplest option would probably to run a small script that revert
these before commit, or that simply delete these field as another user
suggested. You could configure it as a pre-status and pre-commit hook.
 
An interesting feature Mercurial has, is fileset `hg help fileset`. It
would probably be simple to create a fileset that detect your case and
use it in situation where it helps.
 
I will follow your suggestion and test the pre-status and pre-commit hook.  The fileset feature looks promising, maybe the option "grep(regex)" and "modified()" can do the magic. However, I believe I will need to use the regex over the diff report and not over the whole file in order to handle my issue (I will investigate a bit further anyway)


The best solution for your would probably to teach KNIME to not do this,
but I guess you have tried that already.
I haven't tried to teach Knime as it is a huge software and I prefer to learn how to adapt Mercurial. That way I can reuse my learning for other general purposes in the future beyond the Knime world (and I think that adapting Mercurial will be simpler than adapting Knime to my needs)
 

Cheers,

On 6/8/20 7:32 PM, Andres Sommerhoff wrote:
> Hi all, I want to intervene the diff operation used by the mercurial
> commit. I want to collect only the meaningful changes a heavy directory
> tree full of XML files (this to make easier to audit what really has
> changed, but also saving some disk space by doing so doesn’t hurt). I
> was looking in internet and some mercurial add that could help, put I
> was unsuccessfully, so thinking to make my own extension (or maybe some
> scripting in pre-commit hook).
>
> I will appreciate any advice where to start my intervention of the diff
> process during the commit of mercurial if I go to make my own extension?
> Any help for locating the diff code that is used by mercurial (to look
> and learn how is the interaction with it)?
>
> If you are curious about the problem I’m trying to deal with it, it is
> the software KNIME that the projects (scientific models) developed in
> that software is saved in several XML files, where each XML represent a
> small portion of the model (“nodes” as called in KNIME). One project can
> easily have more than 500 nodes (-> XML files). If I change a single
> node and save the project, then not only the single related file is
> changes but all the 500 XML files are also updated. Inside each XML file
> the “last modification date” and “last author” is changed.
>
> I’m looking to skip all the files that the single change was updating
> “last modification date” and “last author” but nothing else. By doing
> so, I can focus in the important changes, making easy to audit the
> manful modifications, merges can be far less cumbersome, and the history
> much cleaner when making a log on a specific file.
>
> Maybe a simple command line option for the commit is the solution, but I
> see no official option in the commit command in order to use an
> alternative diff tool for calculating the patches. On the other hand, as
> far I read, the “extdiff” option only affect the comparison of the
> revisions, but not for commit process (maybe I’m wrong on this last
> sentence). Or maybe a commit command line using the function
> “diff([includepattern [, excludepattern]]):” in conjunction with
> “--exclude “ will make the magic I’m looking for, but I couldn’t figure
> it out yet.
>
> I’m on Windows 10, using Mercurial and TortoiseHg 5.0.2.
>
> Regards, Andres
>
> _______________________________________________
> Mercurial mailing list
> [hidden email]
> https://www.mercurial-scm.org/mailman/listinfo/mercurial
>

--
Pierre-Yves David

_______________________________________________
Mercurial mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial