D8189: testlib: add a small scrip to help process to synchronise using file

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
marmoute created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Creating and waiting for files is a robust way to synchronise two processes
  running concurrently. We already use this approach in various tests. I am adding
  a official script to do so before adding more usage of this.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

AFFECTED FILES
  tests/testlib/wait-on-file

CHANGE DETAILS

diff --git a/tests/testlib/wait-on-file b/tests/testlib/wait-on-file
new file mode 100755
--- /dev/null
+++ b/tests/testlib/wait-on-file
@@ -0,0 +1,32 @@
+#!/bin/bash
+#
+# wait up to TIMEOUT seconds until a WAIT_ON_FILE is created.
+#
+# In addition, this script can create CREATE_FILE once it is ready to wait.
+
+if [ $# -lt 2 ] || [ $# -gt 3 ]; then
+    echo $#
+    echo "USAGE: $0 TIMEOUT WAIT_ON_FILE [CREATE_FILE]"
+fi
+
+timer="$1"
+wait_on="$2"
+create=""
+if [ $# -eq 3 ]; then
+    create="$3"
+fi
+
+if [ -n "$create" ];
+then
+    touch "$create"
+    create=""
+fi
+while [ "$timer" -gt 0 ] && [ ! -f "$wait_on" ];
+do
+    timer=$(( timer - 1))
+    sleep 1
+done
+if [ "$timer" -le 0 ]; then
+    echo "file not created after $1 seconds: $wait_on" >&2
+    exit 1
+fi



To: marmoute, #hg-reviewers
Cc: mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
durin42 added a comment.


  This script feels extremely like a flaky test waiting to happen. Is there no alternative for this?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

To: marmoute, #hg-reviewers
Cc: durin42, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
In reply to this post by valentin.gatienbaron (Valentin Gatien-Baron)
marmoute added a comment.


  By using explicit wait on signal (through the fs) that each process reached the appropriate file. We avoid flackyness. There are already multiple use of this approach in the test suite, that does not suffer from flackyness (unlike the wheelbarrow of flaky test relying on sleep for sync).

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

To: marmoute, #hg-reviewers
Cc: durin42, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
In reply to this post by valentin.gatienbaron (Valentin Gatien-Baron)
durin42 added a comment.


  In D8189#123280 <https://phab.mercurial-scm.org/D8189#123280>, @marmoute wrote:
 
  > By using explicit wait on signal (through the fs) that each process reached the appropriate file. We avoid flackyness. There are already multiple use of this approach in the test suite, that does not suffer from flackyness (unlike the wheelbarrow of flaky test relying on sleep for sync).
 
  You avoid flakyness iff the test manages to finish this step in under 20 seconds (in the next change, as an example). Which is to say, this is still a flake waiting to happen, you've just made it less likely. I think it might be better to poll more often in the script and not even take a timeout: sleep forever waiting for the condition, and if it never comes let the test timeout at the runner level. Thoughts?
 
  I'm also not happy about the 1-second floor this puts on the step. Doesn't sleep(1) support sub-second sleeps on all platforms at this point?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

To: marmoute, #hg-reviewers
Cc: durin42, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
In reply to this post by valentin.gatienbaron (Valentin Gatien-Baron)
mharbison72 added a comment.


  In D8189#123323 <https://phab.mercurial-scm.org/D8189#123323>, @durin42 wrote:
 
  > I think it might be better to poll more often in the script and not even take a timeout: sleep forever waiting for the condition, and if it never comes let the test timeout at the runner level. Thoughts?
 
  I filed a bug about this that self-archived, but:
 
    $ echo '  $ sleep 10' > test-timeout.t
    $ time ./run-tests.py --local test-timeout.t -t 5
    running 1 tests using 1 parallel processes
    t
    Failed test-timeout.t: timed out
    # Ran 1 tests, 0 skipped, 1 failed.
    python hash seed: 204038743
   
    real    0m10.363s
    user    0m0.000s
    sys     0m0.030s
 
  So it looks like tests never timeout, but then the result is discarded afterward if the timeout period elapsed.  I can reproduce it on Windows and macOS.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

To: marmoute, #hg-reviewers
Cc: mharbison72, durin42, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
In reply to this post by valentin.gatienbaron (Valentin Gatien-Baron)
marmoute added a comment.


  In D8189#123323 <https://phab.mercurial-scm.org/D8189#123323>, @durin42 wrote:
 
  > In D8189#123280 <https://phab.mercurial-scm.org/D8189#123280>, @marmoute wrote:
  >
  >> By using explicit wait on signal (through the fs) that each process reached the appropriate file. We avoid flackyness. There are already multiple use of this approach in the test suite, that does not suffer from flackyness (unlike the wheelbarrow of flaky test relying on sleep for sync).
  >
  > You avoid flakyness iff the test manages to finish this step in under 20 seconds (in the next change, as an example). Which is to say, this is still a flake waiting to happen, you've just made it less likely. I think it might be better to poll more often in the script and not even take a timeout: sleep forever waiting for the condition, and if it never comes let the test timeout at the runner level. Thoughts?
 
  The 20 seconds seems like a lots of margin already, but I am fine with bumping it more it that make your more confortable.
 
  Waiting for the test timeout is not a reasonable option because the test is killed without any details (and it is LOONG). The most common case for reaching the timeout is for one of the process to crash before reaching the checkpoing. When that happens, we want to be able to read the traceback. The second most common is code misbehaving and not going through the expected codepath. We also was to get output in this case. So in short, we need a clean way out in case of error and I have no better option than a (possibly long) timeout right now.
 
  > I'm also not happy about the 1-second floor this puts on the step. Doesn't sleep(1) support sub-second sleeps on all platforms at this point?
 
  I am extremly sad too. But last time I checked, it was still not the case. We detect plateform and use small increment on better plateform (but I would rather follow up for that).

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

To: marmoute, #hg-reviewers
Cc: mharbison72, durin42, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
In reply to this post by valentin.gatienbaron (Valentin Gatien-Baron)
marmoute added a comment.


  In D8189#123521 <https://phab.mercurial-scm.org/D8189#123521>, @mharbison72 wrote:
 
  > In D8189#123323 <https://phab.mercurial-scm.org/D8189#123323>, @durin42 wrote:
  >
  >> I think it might be better to poll more often in the script and not even take a timeout: sleep forever waiting for the condition, and if it never comes let the test timeout at the runner level. Thoughts?
  >
  > I filed a bug about this that self-archived, but:
  >
  >   $ echo '  $ sleep 10' > test-timeout.t
  >   $ time ./run-tests.py --local test-timeout.t -t 5
  >   running 1 tests using 1 parallel processes
  >   t
  >   Failed test-timeout.t: timed out
  >   # Ran 1 tests, 0 skipped, 1 failed.
  >   python hash seed: 204038743
  >   real    0m10.363s
  >   user    0m0.000s
  >   sys     0m0.030s
  >
  > So it looks like tests never timeout, but then the result is discarded afterward if the timeout period elapsed.  I can reproduce it on Windows and macOS.
 
  Do you have a link to the bug ?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

To: marmoute, #hg-reviewers
Cc: mharbison72, durin42, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
In reply to this post by valentin.gatienbaron (Valentin Gatien-Baron)
mharbison72 added a comment.


  In D8189#123526 <https://phab.mercurial-scm.org/D8189#123526>, @marmoute wrote:
 
  > In D8189#123521 <https://phab.mercurial-scm.org/D8189#123521>, @mharbison72 wrote:
  >
  >> In D8189#123323 <https://phab.mercurial-scm.org/D8189#123323>, @durin42 wrote:
  >>
  >>> I think it might be better to poll more often in the script and not even take a timeout: sleep forever waiting for the condition, and if it never comes let the test timeout at the runner level. Thoughts?
  >>
  >> I filed a bug about this that self-archived, but:
  >>
  >>   $ echo '  $ sleep 10' > test-timeout.t
  >>   $ time ./run-tests.py --local test-timeout.t -t 5
  >>   running 1 tests using 1 parallel processes
  >>   t
  >>   Failed test-timeout.t: timed out
  >>   # Ran 1 tests, 0 skipped, 1 failed.
  >>   python hash seed: 204038743
  >>   real    0m10.363s
  >>   user    0m0.000s
  >>   sys     0m0.030s
  >>
  >> So it looks like tests never timeout, but then the result is discarded afterward if the timeout period elapsed.  I can reproduce it on Windows and macOS.
  >
  > Do you have a link to the bug ?
 
  https://bz.mercurial-scm.org/show_bug.cgi?id=6125
 
  > Waiting for the test timeout is not a reasonable option because the test is killed without any details (and it is LOONG). The most common case for reaching the timeout is for one of the process to crash before reaching the checkpoing. When that happens, we want to be able to read the traceback. The second most common is code misbehaving and not going through the expected codepath. We also was to get output in this case. So in short, we need a clean way out in case of error and I have no better option than a (possibly long) timeout right now.
 
  I wonder if the test harness can be modified to process the data collected up to the point of the timeout, so that it's obvious what is getting stuck.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

To: marmoute, #hg-reviewers
Cc: mharbison72, durin42, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
In reply to this post by valentin.gatienbaron (Valentin Gatien-Baron)
marmoute added a comment.


  Gentle ping on this patch.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

To: marmoute, #hg-reviewers
Cc: mharbison72, durin42, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
In reply to this post by valentin.gatienbaron (Valentin Gatien-Baron)
durin42 added a comment.


  I'm still -0 on this: I'd rather we found an approach that didn't require sleeping for so long. Perhaps a Python script would be a better fit here?
 
  (I won't block this landing, but I won't push it.)

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

To: marmoute, #hg-reviewers
Cc: mharbison72, durin42, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
In reply to this post by valentin.gatienbaron (Valentin Gatien-Baron)
marmoute added a comment.
marmoute added a subscriber: durin42.


  In D8189#124122 <https://phab.mercurial-scm.org/D8189#124122>, @durin42 wrote:
 
  > I'm still -0 on this: I'd rather we found an approach that didn't require sleeping for so long. Perhaps a Python script would be a better fit here?
  > (I won't block this landing, but I won't push it.)
 
  What abotu replacing the sleep 1 by a `python -c "import time; time.sleep(0.1)" would that make you happy ?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

To: marmoute, #hg-reviewers
Cc: durin42, mharbison72, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
In reply to this post by valentin.gatienbaron (Valentin Gatien-Baron)
durin42 added a comment.


  In D8189#124124 <https://phab.mercurial-scm.org/D8189#124124>, @marmoute wrote:
 
  > In D8189#124122 <https://phab.mercurial-scm.org/D8189#124122>, @durin42 wrote:
  >
  >> I'm still -0 on this: I'd rather we found an approach that didn't require sleeping for so long. Perhaps a Python script would be a better fit here?
  >> (I won't block this landing, but I won't push it.)
  >
  > What abotu replacing the sleep 1 by a `python -c "import time; time.sleep(0.1)" would that make you happy ?
 
  Happy? No, not really. I think I'd rather it was a Python script, but what I'd _really_ rather (and what would actually make me happy) is that we didn't have sleep-required steps like this at all. They're inherently racy , and I feel like there's got to be a better solution.
 
  At this point I don't have the patience to try and work through this patch, so you'll need to find a different reviewer.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

To: marmoute, #hg-reviewers
Cc: durin42, mharbison72, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
In reply to this post by valentin.gatienbaron (Valentin Gatien-Baron)
marmoute added a comment.
marmoute added a subscriber: durin42.


  In D8189#123522 <https://phab.mercurial-scm.org/D8189#123522>, @marmoute wrote:
 
  > In D8189#123323 <https://phab.mercurial-scm.org/D8189#123323>, @durin42 wrote:
  >
  >> […]
  >> I'm also not happy about the 1-second floor this puts on the step. Doesn't sleep(1) support sub-second sleeps on all platforms at this point?
  >
  > I am extremly sad too. But last time I checked, it was still not the case. We detect plateform and use small increment on better plateform (but I would rather follow up for that).
 
  Good news, even is sub-second is not expect to work on all plateforms, I found out that we already use it in our test suite. So any plateform that does not support it are already broken. I'll send an update soon.
 
  I am adding a small change to have the local timeout adjust itself according to the global time out. I am not aware of real live issues with the local timeout, but adding that logic is simple enough.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

To: marmoute, #hg-reviewers
Cc: durin42, mharbison72, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
In reply to this post by valentin.gatienbaron (Valentin Gatien-Baron)
marmoute updated this revision to Diff 20856.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8189?vs=20392&id=20856

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

AFFECTED FILES
  tests/testlib/wait-on-file

CHANGE DETAILS

diff --git a/tests/testlib/wait-on-file b/tests/testlib/wait-on-file
new file mode 100755
--- /dev/null
+++ b/tests/testlib/wait-on-file
@@ -0,0 +1,32 @@
+#!/bin/bash
+#
+# wait up to TIMEOUT seconds until a WAIT_ON_FILE is created.
+#
+# In addition, this script can create CREATE_FILE once it is ready to wait.
+
+if [ $# -lt 2 ] || [ $# -gt 3 ]; then
+    echo $#
+    echo "USAGE: $0 TIMEOUT WAIT_ON_FILE [CREATE_FILE]"
+fi
+
+timer="$1"
+wait_on="$2"
+create=""
+if [ $# -eq 3 ]; then
+    create="$3"
+fi
+
+if [ -n "$create" ];
+then
+    touch "$create"
+    create=""
+fi
+while [ "$timer" -gt 0 ] && [ ! -f "$wait_on" ];
+do
+    timer=$(( timer - 1))
+    sleep 0.01
+done
+if [ "$timer" -le 0 ]; then
+    echo "file not created after $1 seconds: $wait_on" >&2
+    exit 1
+fi



To: marmoute, #hg-reviewers
Cc: durin42, mharbison72, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
In reply to this post by valentin.gatienbaron (Valentin Gatien-Baron)
pulkit added a comment.


  I wanted to help with things here but unfortunately I have ~0 experience with shell scripts and the kind of process testing going in next few patches.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

To: marmoute, #hg-reviewers
Cc: pulkit, durin42, mharbison72, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

D8189: testlib: add a small scrip to help process to synchronise using file

valentin.gatienbaron (Valentin Gatien-Baron)
In reply to this post by valentin.gatienbaron (Valentin Gatien-Baron)
Closed by commit rHG1ed6293fc31b: testlib: add a small scrip to help process to synchronise using file (authored by marmoute).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8189?vs=20856&id=20930

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8189/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8189

AFFECTED FILES
  tests/testlib/wait-on-file

CHANGE DETAILS

diff --git a/tests/testlib/wait-on-file b/tests/testlib/wait-on-file
new file mode 100755
--- /dev/null
+++ b/tests/testlib/wait-on-file
@@ -0,0 +1,32 @@
+#!/bin/bash
+#
+# wait up to TIMEOUT seconds until a WAIT_ON_FILE is created.
+#
+# In addition, this script can create CREATE_FILE once it is ready to wait.
+
+if [ $# -lt 2 ] || [ $# -gt 3 ]; then
+    echo $#
+    echo "USAGE: $0 TIMEOUT WAIT_ON_FILE [CREATE_FILE]"
+fi
+
+timer="$1"
+wait_on="$2"
+create=""
+if [ $# -eq 3 ]; then
+    create="$3"
+fi
+
+if [ -n "$create" ];
+then
+    touch "$create"
+    create=""
+fi
+while [ "$timer" -gt 0 ] && [ ! -f "$wait_on" ];
+do
+    timer=$(( timer - 1))
+    sleep 0.01
+done
+if [ "$timer" -le 0 ]; then
+    echo "file not created after $1 seconds: $wait_on" >&2
+    exit 1
+fi



To: marmoute, #hg-reviewers
Cc: pulkit, durin42, mharbison72, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel