[PATCH STABLE] hgweb: garbage collect on every request

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH STABLE] hgweb: garbage collect on every request

Gregory Szorc
# HG changeset patch
# User Gregory Szorc <[hidden email]>
# Date 1520885700 25200
#      Mon Mar 12 13:15:00 2018 -0700
# Branch stable
# Node ID 46905416a5f47e9e46aa2db0e2e4f45e7414c979
# Parent  9639c433be54191b4136b48fe70fc8344d2b5db2
hgweb: garbage collect on every request

There appears to be a cycle in localrepository or hgweb that
is preventing repositories from being garbage collected when
hgwebdir dispatches to hgweb. Every request creates a new
repository instance and then leaks that object and other referenced
objects. A periodic GC to find cycles will eventually collect the
old repositories. But these don't run reliably and rapid requests
to hgwebdir can result in rapidly increasing memory consumption.

With the Firefox repository, repeated requests to raw-file URLs
leak ~100 MB per hgwebdir request (most of this appears to be
cached manifest data structures). WSGI processes quickly grow
to >1 GB RSS.

Breaking the cycles in localrepository is going to be a bit of
work.

Because we know that hgwebdir leaks localrepository instances, let's
put a band aid on the problem in the form of an explicit gc.collect()
on every hgwebdir request.

As the inline comment states, ideally we'd do this in a finally
block for the current request iff it dispatches to hgweb. But
_runwsgi() returns an explicit value. We need the finally to run
after generator exhaustion. So we'd need to refactor _runwsgi()
to "yield" instead of "return." That's too much change for a patch
to stable. So we implement this hack one function above and run
it on every request.

The performance impact of this change should be minimal. Any
impact should be offset by benefits from not having hgwebdir
processes leak memory.

diff --git a/mercurial/hgweb/hgwebdir_mod.py b/mercurial/hgweb/hgwebdir_mod.py
--- a/mercurial/hgweb/hgwebdir_mod.py
+++ b/mercurial/hgweb/hgwebdir_mod.py
@@ -8,6 +8,7 @@
 
 from __future__ import absolute_import
 
+import gc
 import os
 import re
 import time
@@ -224,8 +225,18 @@ class hgwebdir(object):
     def run_wsgi(self, req):
         profile = self.ui.configbool('profiling', 'enabled')
         with profiling.profile(self.ui, enabled=profile):
-            for r in self._runwsgi(req):
-                yield r
+            try:
+                for r in self._runwsgi(req):
+                    yield r
+            finally:
+                # There are known cycles in localrepository that prevent
+                # those objects (and tons of held references) from being
+                # collected through normal refcounting. We mitigate those
+                # leaks by performing an explicit GC on every request.
+                # TODO remove this once leaks are fixed.
+                # TODO only run this on requests that create localrepository
+                # instances instead of every request.
+                gc.collect()
 
     def _runwsgi(self, req):
         try:
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH STABLE] hgweb: garbage collect on every request

Anton Shestakov-3
On Mon, 12 Mar 2018 13:17:21 -0700
Gregory Szorc <[hidden email]> wrote:

> # HG changeset patch
> # User Gregory Szorc <[hidden email]>
> # Date 1520885700 25200
> #      Mon Mar 12 13:15:00 2018 -0700
> # Branch stable
> # Node ID 46905416a5f47e9e46aa2db0e2e4f45e7414c979
> # Parent  9639c433be54191b4136b48fe70fc8344d2b5db2
> hgweb: garbage collect on every request

Queued, thanks.

Unfortunately, this patch doesn't help a lot when evolve extension is
enabled: until I turned it off to test, hgweb consistently consumed
200MB more on every request (on hg-committed), despite gc.collect(). No
idea why.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH STABLE] hgweb: garbage collect on every request

Gregory Szorc


> On Mar 12, 2018, at 23:18, Anton Shestakov <[hidden email]> wrote:
>
> On Mon, 12 Mar 2018 13:17:21 -0700
> Gregory Szorc <[hidden email]> wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc <[hidden email]>
>> # Date 1520885700 25200
>> #      Mon Mar 12 13:15:00 2018 -0700
>> # Branch stable
>> # Node ID 46905416a5f47e9e46aa2db0e2e4f45e7414c979
>> # Parent  9639c433be54191b4136b48fe70fc8344d2b5db2
>> hgweb: garbage collect on every request
>
> Queued, thanks.
>
> Unfortunately, this patch doesn't help a lot when evolve extension is
> enabled: until I turned it off to test, hgweb consistently consumed
> 200MB more on every request (on hg-committed), despite gc.collect(). No
> idea why.

:(

My guess is evolve is storing a reference to a repo or some other large data structure in a module-level variable. Leaks like that are pretty easy to spot using a tool like guppy’s heap snapshots.
_______________________________________________
Mercurial-devel mailing list
[hidden email]
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel