Proposal for cleaning up error reporting

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposal for cleaning up error reporting

Martin von Zweigbergk via Mercurial-devel
Hi everyone.

Any comments on this early proposal are most welcome:

Our motivation is to measure the user experience our users are actually getting, and have realtime pager alerts if e.g. a new client or server release causes a big change, but not count or alert on errors that are "not our fault", such as bad user input.
For instance, we have up to 70% error rates on weekends because some people leave something like "hg status" or "hg log" running in a loop on a terminal, and their credentials expire - those should not be counted at all. Even in the middle of workdays, our running error rate is way more than it "should" be, because we currently measure all non-zero status codes.

Thanks
Rodrigo


_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for cleaning up error reporting

Augie Fackler-2
I think this makes sense, but I'd love to hear feedback from non-Google people...

On Jun 12, 2020, at 20:10, Rodrigo Damazio via Mercurial-devel <[hidden email]> wrote:

Hi everyone.

Any comments on this early proposal are most welcome:

Our motivation is to measure the user experience our users are actually getting, and have realtime pager alerts if e.g. a new client or server release causes a big change, but not count or alert on errors that are "not our fault", such as bad user input.
For instance, we have up to 70% error rates on weekends because some people leave something like "hg status" or "hg log" running in a loop on a terminal, and their credentials expire - those should not be counted at all. Even in the middle of workdays, our running error rate is way more than it "should" be, because we currently measure all non-zero status codes.

Thanks
Rodrigo

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for cleaning up error reporting

Martin von Zweigbergk via Mercurial-devel
Thanks Augie. Me too.
If nobody has thoughts I'll start sending (large, invasive) code changes soon :)


On Wed, Jul 15, 2020 at 9:20 AM Augie Fackler <[hidden email]> wrote:
I think this makes sense, but I'd love to hear feedback from non-Google people...

On Jun 12, 2020, at 20:10, Rodrigo Damazio via Mercurial-devel <[hidden email]> wrote:

Hi everyone.

Any comments on this early proposal are most welcome:

Our motivation is to measure the user experience our users are actually getting, and have realtime pager alerts if e.g. a new client or server release causes a big change, but not count or alert on errors that are "not our fault", such as bad user input.
For instance, we have up to 70% error rates on weekends because some people leave something like "hg status" or "hg log" running in a loop on a terminal, and their credentials expire - those should not be counted at all. Even in the middle of workdays, our running error rate is way more than it "should" be, because we currently measure all non-zero status codes.

Thanks
Rodrigo

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for cleaning up error reporting

Augie Fackler-2
Maybe bite off one smallish easy case and mail it out to try and force someone to think about it?

On Jul 15, 2020, at 16:50, Rodrigo Damazio <[hidden email]> wrote:

Thanks Augie. Me too.
If nobody has thoughts I'll start sending (large, invasive) code changes soon :)


On Wed, Jul 15, 2020 at 9:20 AM Augie Fackler <[hidden email]> wrote:
I think this makes sense, but I'd love to hear feedback from non-Google people...

On Jun 12, 2020, at 20:10, Rodrigo Damazio via Mercurial-devel <[hidden email]> wrote:

Hi everyone.

Any comments on this early proposal are most welcome:

Our motivation is to measure the user experience our users are actually getting, and have realtime pager alerts if e.g. a new client or server release causes a big change, but not count or alert on errors that are "not our fault", such as bad user input.
For instance, we have up to 70% error rates on weekends because some people leave something like "hg status" or "hg log" running in a loop on a terminal, and their credentials expire - those should not be counted at all. Even in the middle of workdays, our running error rate is way more than it "should" be, because we currently measure all non-zero status codes.

Thanks
Rodrigo

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel



_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for cleaning up error reporting

Yuya Nishihara
In reply to this post by Augie Fackler-2
On Wed, 15 Jul 2020 12:20:14 -0400, Augie Fackler wrote:
> I think this makes sense, but I'd love to hear feedback from non-Google people...
> > On Jun 12, 2020, at 20:10, Rodrigo Damazio via Mercurial-devel <[hidden email]> wrote:
> > Any comments on this early proposal are most welcome:
> > https://www.mercurial-scm.org/wiki/ErrorCategoriesPlan <https://www.mercurial-scm.org/wiki/ErrorCategoriesPlan>

I'm skeptical about the use of exit status, but the general idea sounds
good to me. Specialized exception classes should be nice.

One thing to note is that changing low-integer status codes (0, 1..)
will break existing scripts, and should be gated by config knob. For
example, TortoiseHg tests ret == 1:

% rg -i '(code|ret)\w* == 1' tortoisehg
tortoisehg/hgqt/quickop.py:249:                    if ret == 1:
tortoisehg/hgqt/commit.py:1168:        elif ret == 1 and self.currentAction in ('amend', 'commit'):
tortoisehg/hgqt/bookmark.py:394:        elif ret == 1:
tortoisehg/hgqt/bookmark.py:447:        if ret == 0 or ret == 1:
tortoisehg/hgqt/rename.py:235:        if ret == 1:
tortoisehg/hgqt/lfprompt.py:65:        elif ret == 1:
tortoisehg/hgqt/sync.py:783:        elif ret == 1:
tortoisehg/hgqt/sync.py:855:        elif ret == 1:
tortoisehg/hgqt/sync.py:978:        elif ret == 1:
tortoisehg/hgqt/sync.py:1031:        elif ret == 1:
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for cleaning up error reporting

Pierre-Yves David-2
In reply to this post by Martin von Zweigbergk via Mercurial-devel
The first time I saw this proposal I though:

- That is a great idea, I have been unhappy about these return for a
long time
- The backward compatibility breakage implication might be huge. We
probably need to gate this behind config at first.
- The proposal is over all good but I have a couple of adjustement and
improvement in mind.

Then the paperwork gods swiftly jumped back at me and I got drown in
end-of-fiscal-year stuff and forgot to reply.

Since this was 1 month ago, I am sending this small answer with my
general sentiment to avoid letting this fell through the crack again.
I'll try to find time to give actual feedback and change proposal on
this soon.

On 6/13/20 2:10 AM, Rodrigo Damazio via Mercurial-devel wrote:

> Hi everyone.
>
> Any comments on this early proposal are most welcome:
> https://www.mercurial-scm.org/wiki/ErrorCategoriesPlan
>
> Our motivation is to measure the user experience our users are actually
> getting, and have realtime pager alerts if e.g. a new client or server
> release causes a big change, but not count or alert on errors that are
> "not our fault", such as bad user input.
> For instance, we have up to 70% error rates on weekends because some
> people leave something like "hg status" or "hg log" running in a loop on
> a terminal, and their credentials expire - those should not be counted
> at all. Even in the middle of workdays, our running error rate is way
> more than it "should" be, because we currently measure all non-zero
> status codes.
>
> Thanks
> Rodrigo
>
>
> _______________________________________________
> Mercurial-devel mailing list
> [hidden email]
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>

--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for cleaning up error reporting

Martin von Zweigbergk via Mercurial-devel
On Thu, Jul 16, 2020 at 9:09 AM Pierre-Yves David <[hidden email]> wrote:
The first time I saw this proposal I though:

- That is a great idea, I have been unhappy about these return for a
long time
- The backward compatibility breakage implication might be huge. We
probably need to gate this behind config at first.

I'm fine with gating some part of it by a config knob, and will leave it up to the community to decide when to break backwards compatibility in the future.
 
- The proposal is over all good but I have a couple of adjustement and
improvement in mind.

Would love to hear your thoughts.

And from Google's perspective, I don't feel too strongly about the return codes - I added it because it made sense to have an end-to-end design.
What we'll likely do is intercept the exceptions in our extension and report any failures by type to our servers, rather than relying on the actual codes.

Then the paperwork gods swiftly jumped back at me and I got drown in
end-of-fiscal-year stuff and forgot to reply.

Since this was 1 month ago, I am sending this small answer with my
general sentiment to avoid letting this fell through the crack again.
I'll try to find time to give actual feedback and change proposal on
this soon.

On 6/13/20 2:10 AM, Rodrigo Damazio via Mercurial-devel wrote:
> Hi everyone.
>
> Any comments on this early proposal are most welcome:
> https://www.mercurial-scm.org/wiki/ErrorCategoriesPlan
>
> Our motivation is to measure the user experience our users are actually
> getting, and have realtime pager alerts if e.g. a new client or server
> release causes a big change, but not count or alert on errors that are
> "not our fault", such as bad user input.
> For instance, we have up to 70% error rates on weekends because some
> people leave something like "hg status" or "hg log" running in a loop on
> a terminal, and their credentials expire - those should not be counted
> at all. Even in the middle of workdays, our running error rate is way
> more than it "should" be, because we currently measure all non-zero
> status codes.
>
> Thanks
> Rodrigo
>
>
> _______________________________________________
> Mercurial-devel mailing list
> [hidden email]
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>

--
Pierre-Yves David

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for cleaning up error reporting

Pierre-Yves David-2


On 7/21/20 7:57 AM, Rodrigo Damazio wrote:

> On Thu, Jul 16, 2020 at 9:09 AM Pierre-Yves David
> <[hidden email] <mailto:[hidden email]>>
> wrote:
>
>     The first time I saw this proposal I though:
>
>     - That is a great idea, I have been unhappy about these return for a
>     long time
>
>     - The backward compatibility breakage implication might be huge. We
>     probably need to gate this behind config at first.
>
>
> I'm fine with gating some part of it by a config knob, and will leave it
> up to the community to decide when to break backwards compatibility in
> the future.
>
>     - The proposal is over all good but I have a couple of adjustement and
>     improvement in mind.
>
>
> Would love to hear your thoughts.
>
> And from Google's perspective, I don't feel too strongly about the
> return codes - I added it because it made sense to have an end-to-end
> design.
> What we'll likely do is intercept the exceptions in our extension and
> report any failures by type to our servers, rather than relying on the
> actual codes.
Here are some quick through I got by browsing the document again.

Input: There are different type of input failure, it would be nice to
distinct between them. For example:

   - revset is invalid and cannot be parsed
   - revset reference an unknown revision/name
   - revset is empty (or anything wrong for the input).

Configuration: unsupported requires should probably be its own
exception. having a clear signal of "I recognise that I cannot deal with
the reppository" is important. There could be some overlap (either the
same error or a similar one) with what we do with the Rust fastpath.

Remote: It would be nice if the remote return code could match the local
one (so, the one raised on the server). with and extra bit set. eg: code
20 locally would become code 84 if raised remotely.

--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for cleaning up error reporting

Augie Fackler-2


> On Jul 31, 2020, at 13:26, Pierre-Yves David <[hidden email]> wrote:
>
>
>
> On 7/21/20 7:57 AM, Rodrigo Damazio wrote:
>> On Thu, Jul 16, 2020 at 9:09 AM Pierre-Yves David <[hidden email] <mailto:[hidden email]>> wrote:
>>    The first time I saw this proposal I though:
>>    - That is a great idea, I have been unhappy about these return for a
>>    long time
>>    - The backward compatibility breakage implication might be huge. We
>>    probably need to gate this behind config at first.
>> I'm fine with gating some part of it by a config knob, and will leave it up to the community to decide when to break backwards compatibility in the future.
>>    - The proposal is over all good but I have a couple of adjustement and
>>    improvement in mind.
>> Would love to hear your thoughts.
>> And from Google's perspective, I don't feel too strongly about the return codes - I added it because it made sense to have an end-to-end design.
>> What we'll likely do is intercept the exceptions in our extension and report any failures by type to our servers, rather than relying on the actual codes.
> Here are some quick through I got by browsing the document again.
>
> Input: There are different type of input failure, it would be nice to distinct between them. For example:
>
>  - revset is invalid and cannot be parsed
>  - revset reference an unknown revision/name
>  - revset is empty (or anything wrong for the input).
>
> Configuration: unsupported requires should probably be its own exception. having a clear signal of "I recognise that I cannot deal with the reppository" is important. There could be some overlap (either the same error or a similar one) with what we do with the Rust fastpath.
>
> Remote: It would be nice if the remote return code could match the local one (so, the one raised on the server). with and extra bit set. eg: code 20 locally would become code 84 if raised remotely.

Oooh now that's clever. I like it.

>
> --
> Pierre-Yves David
> _______________________________________________
> Mercurial-devel mailing list
> [hidden email]
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for cleaning up error reporting

Gregory Szorc
In reply to this post by Martin von Zweigbergk via Mercurial-devel
On Fri, Jun 12, 2020 at 6:15 PM Rodrigo Damazio via Mercurial-devel <[hidden email]> wrote:
Hi everyone.

Any comments on this early proposal are most welcome:

Our motivation is to measure the user experience our users are actually getting, and have realtime pager alerts if e.g. a new client or server release causes a big change, but not count or alert on errors that are "not our fault", such as bad user input.
For instance, we have up to 70% error rates on weekends because some people leave something like "hg status" or "hg log" running in a loop on a terminal, and their credentials expire - those should not be counted at all. Even in the middle of workdays, our running error rate is way more than it "should" be, because we currently measure all non-zero status codes.

I'm overall very supportive of this effort. Feedback:

* I emphatically support expanding the set of internal types to represent errors more granularly and to capture richer metadata.
* We probably need a config knob to control exit codes for BC. We could gate this behind ui.tweakdefaults if we wanted.
* Deleting error.Abort will have a big impact on extensions. It might be best to retain it indefinitely.
* I like being able to better detect unhandled exceptions and "our" errors. If we get to that state, perhaps we could add a config knob where clients can opt in to sending exceptions to a Sentry instance or something so we have more telemetry from the wild.

Overall great proposal!

_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel