Apply feedback

2025-01-21 02:12:15 -05:00 · 2025-01-05 17:13:22 +01:00 · 2025-01-05 17:13:22 +01:00 · 9c4623d089
commit 9c4623d089
parent 68b29f3400
1 changed files with 12 additions and 13 deletions
--- a/docs/admin/profiling.md
+++ b/docs/admin/profiling.md
@ -5,12 +5,11 @@ license: 'CC-BY-SA-4.0'

 Forgejo is written in [Go](https://go.dev), which means Forgejo can leverage [the extensive profiling tooling](https://go.dev/doc/diagnostics) that is provided by Go's runtime.
 Profiling in Forgejo can be done in production; it is safe and with low overhead; no settings need to be enabled for that.
-The act of profiling should be done with great responsibility and should not be seen as a magical oracle.
 A good knowledge of Forgejo internals helps with drawing conclusions from profiling.
 It is easy to draw the wrong conclusions, be overwhelmed by the data, or try to find a needle in a haystack.
-In the case you are reading this because you are suspecting or experiencing performance problems with Forgejo, it is recommended to [make an issue](https://codeberg.org/forgejo/forgejo/issues) or contact [the Forgejo developers](https://matrix.to/#/#forgejo-development:matrix.org); this can help in narrowing down where to look and help with interpreting profiling results.
+In the case you are reading this because you are suspecting or experiencing performance problems with Forgejo, it is recommended to [make an issue](https://codeberg.org/forgejo/forgejo/issues) or contact [the Forgejo developers](https://matrix.to/#/#forgejo-development:matrix.org); this can help in narrowing down where to look or with interpreting profiling results.

-The following sections go into more detail about the different types of profiling that can be done in Forgejo.
+The following sections go into more details about the different types of profiling that can be done in Forgejo.

 ## Process stacktraces

@ -21,22 +20,22 @@ Most goroutines[^0], whether short-lived or long-lived, are tracked by Forgejo a

 In the admin settings navigate to the "Stacktrace" item in the "Monitoring" menu; this is also the page where the other profiling will take place.
 The two buttons on above the segment determine what type of proccesses you want to see, "Running proccesses" show all non-system proccesses and "Stacktrace" shows all proccesses.
-It is good to note that the information of the proccesses are only a snapshot of what the process was doing at the time of you loading this page, it is not continously being updated; they could no longer exist or be another point in their task for which the stacktrace no longer match.
+It is good to note that the information of the proccesses are only a snapshot of what the process was doing at the time of you loading this page, it is not continously being updated; they could no longer exist or kept running and no longer be at this particular point in the codepath.

 Each process in the list has a few key details:

 - Title: this is the description that was given to the proccess, this should identify why it was created and what the purpose is.
 - Date: this indicates when the process was created.

-And a stacktrace, it is possible that a process has more than one stacktrace, in that case the first stacktrace can be seen as the _main_ stacktrace of that process and the others are children of that process that were not marked as new process. The stacktrace is read from top to bottom, where each entry in the stacktrace represents a stack frame[^1]. The top entry is where the newest stack frame and the bottom entry is the oldest stack frame and where this process started its lifecycle.
+And a stacktrace, it is possible that a process has more than one stacktrace, in that case the first stacktrace can be seen as the _main_ stacktrace of that process and the others are children of that process that were not marked as new processes. The stacktrace is read from top to bottom, where each entry in the stacktrace represents a stack frame[^1]. The top entry is where the deepest stack frame and the bottom entry is the outer stack frame and where this process started its lifecycle.

 [^1]: This can be conceptually seen as where the program made a function call.

-Interpreting the meaning of a stackframe requires a bit of Go's runtime knowledge, in the case that a process is waiting or polling on a resource, this would often be the case because short lived process where this does not happen will likely not have any meaningful wall time and are finished quickly, would be represented by the top entry being a `runtime.xxx` function, this simply tells what it currently is doing but not how it got to that point, for that you need to search for the highest entry that starts with `code.gitea.io/gitea/`[^2]. That is the code in the Forgejo codebase responsible for getting to that point.
+Interpreting the meaning of a stackframe requires a bit of Go's runtime knowledge. In the case that a process is waiting or polling on a resource (this would often be the case because short lived process where this does not happen will likely not have any meaningful wall time and are finished quickly) would be represented by the deepeste entry being a `runtime.xxx` function, this simply tells what it currently is doing but not how it got to that point, for that you need to search for the outer entry that starts with `code.gitea.io/gitea/`[^2].

 [^2]: Yes, we are aware and [we are working on it](https://codeberg.org/forgejo/discussions/issues/258).

-This type of diagnosis can be helpful if there are many requests that are not being processed quickly enough. By searching for the corresponding URL path on the "Running processes" page, you can find the process that is responsible for processing this particular HTTP request. However, this is only possible with a bit of luck if the monitoring page is loaded in the time period when the actual blocker is blocking the process, otherwise the stacktrace would be meaningless to determine the blocker. Git processes are shown to be blocked on a syscall function, which is entirely correct and expected as the syscall invokes the Git binary and logically Forgejo cannot show what Git is doing and provide a stacktrace of it, some Git processes (notably `cat-file --batch` and `cat-file --batch-check`) are _long-lived_ Git processes, which are kept open (on-demand) in case another call has to be made to one of these two processes within the processing of the same HTTP request and thus the blocking is correct in the case of a syscall and it should rather be checked why the associated HTTP request takes a long time as they are closed after the handling HTTP request is finished.
+This type of diagnosis can be helpful if there are many requests that are not being processed quickly enough. By searching for the corresponding URL path on the "Running processes" page, you can find the process that is responsible for processing this particular HTTP request. However, this is only possible with a bit of luck if the monitoring page is loaded in the time period when the actual problem is blocking the process, otherwise the stacktrace would be meaningless to determine the problem. Git processes are shown to be blocked on a syscall function, which is entirely correct and expected as the syscall invokes the Git binary and logically Forgejo cannot show what Git is doing and provide a stacktrace of it, some Git processes (notably `cat-file --batch` and `cat-file --batch-check`) are _long-lived_ Git processes, which are kept open (on-demand) in case another call has to be made to one of these two processes within the processing of the same HTTP request and thus the blocking is correct in the case of a syscall and it should rather be checked why the associated HTTP request takes a long time as they are closed after the handling HTTP request is finished.

 ## Memory profiling

@ -56,11 +55,11 @@ You can download the heap profile from Forgejo on the stacktrace page in the adm

 Go provides tooling to analyze the `heap.dat` file, make sure to have this file extracted and then execute `go tool pprof heap.dat` in a terminal in the directory that contains the `heap.dat` file. Another complete example of how to make use of this tooling is also [provided by a Go article](https://go.dev/blog/pprof), but to also re-iterate the most important ways to analyze:

- `topN`: Replace `N` with any number and it will display the top `N` functions that have allocated the most memory, important to note that this sorts by the `flat%` and not the `cum%` value, `flat%` is just the amount of memory the function itself has found, if the function has called another function its memory usage would be included in the `cum%` value.
- `web`: Opens up a visual goldmine to show which function uses how much memory and which functions called those functions. A good way to learn how a function was called. This does not show individual calls to function, for that the `traces <regex>` (works similair as the `list` option) can be used.
+- `topN`: Replace `N` with any number and it will display the top `N` functions that have allocated the most memory. It is important to note that this sorts by the `flat%` and not the `cum%` value, `flat%` is just the amount of memory the function itself has allocated, if the function has called another function its memory usage would be included in the `cum%` value.
+- `web`: Opens up a visual goldmine to show which function uses how much memory and which functions called those functions. A good way to learn how a function was called. This does not show individual calls to function, for that the `traces <regex>` (works similar to the `list` option) can be used.
 - `list <regex>`: Replace `<regex>` with a function name or a file name (case sensitive) and it will show exactly which line of source code was responsible for which amount of memory. Note that this requires the source code of Forgejo on a specific path (mainly the path where the Forgejo binary was built), so this option is usually only possible if Forgejo was built from source.

-To take an example of how to effectively memory profile, in Forgejo a lot of regexps are compiled on startup and that means that they will show up in the heap, but e.g. via `topN` it will only show that `regexp.makeOnePass` and `regexp/syntax.(*compiler).inst` allocated memory and it does not show which regexps are actually taking the most memory (and how much). First we can consult the visualization of traces with `web`, we find `regexp.Compile` (this is where a bit of Go knowledges comes in handy to know what the main entry to compiling regexp is which would ultimately called by the Forgejo) and in this case it's possible that only two traces are shown, one from the inserting routes in `go-chi` package and one `data.init` via `MustCompile`. For simplicitly we will explore the `go-chi` trace. Executing `traces InsertRoute` shows indeed some traces to `regexp.Compile` and they actually come from the `addChild` function. You either can search this function on Github or use `list addChild` and see the following:
+To take an example of how to effectively memory profile, in Forgejo a lot of regexps are compiled on startup and that means that they will show up in the heap, but e.g. via `topN` it will only show that `regexp.makeOnePass` and `regexp/syntax.(*compiler).inst` allocated memory and it does not show which regexps are actually taking the most memory (and how much). First we can consult the visualization of traces with `web`, we find `regexp.Compile` (this is where a bit of Go knowledges comes in handy to know what the main entry to compiling regexp is which would ultimately called by Forgejo) and in this case it's possible that only two traces are shown, one from the inserting routes in `go-chi` package and one `data.init` via `MustCompile`. For simplicitly we will explore the `go-chi` trace. Executing `traces InsertRoute` shows indeed some traces to `regexp.Compile` and they actually come from the `addChild` function. You either can search this function on Github or use `list addChild` and see the following:

 ```go
 .          .    253:           if segTyp == ntRegexp {
@ -73,15 +72,15 @@ To take an example of how to effectively memory profile, in Forgejo a lot of reg
 .          .    260:           }
 ```

-There's indeed a call to regexp compile when the route path contains regexp (that's important information to understand _why_ it was allocated in the first place and what the precondition for it is). To find out which routes these are we go back to the output of `traces addChild` and follow the trace back to code in Forgejo, one traces comes from Git HTTP route and another in the API. Now to actually find the exact line in the API and Git HTTP file you can execute `granularity=lines` and then execute `traces addChild` again. This leads to finding [`routers/web/githttp.go:39`](https://codeberg.org/forgejo/forgejo/src/commit/b54424316410803da5d67fe7315ee931b1c84035/routers/web/githttp.go#L39) and [`routers/api/v1/api.go:1262`](https://codeberg.org/forgejo/forgejo/src/commit/b54424316410803da5d67fe7315ee931b1c84035/routers/api/v1/api.go#L1262). According to the Go's memory profile the regexp created for these two routes are combined 1MB. If we want to avoid this allocation we understand now that the route must no longer be a regexp or a regexp that requires less allocated memory.[^5] As noted before, the heap profile does not contain the actual heap and thus it's not possible determine what the content of the heap was and analyze what was contained in the 1MB regexp objects.
+There's indeed a call to regexp compile when the route path contains regexp (that's important information to understand _why_ it was allocated in the first place and what the precondition for it is). To find out which routes these are we go back to the output of `traces addChild` and follow the trace back to code in Forgejo. One trace comes from Git HTTP route and another from the API. Now to actually find the exact line in the API and Git HTTP file you can execute `granularity=lines` and then execute `traces addChild` again. This leads to finding [`routers/web/githttp.go:39`](https://codeberg.org/forgejo/forgejo/src/commit/b54424316410803da5d67fe7315ee931b1c84035/routers/web/githttp.go#L39) and [`routers/api/v1/api.go:1262`](https://codeberg.org/forgejo/forgejo/src/commit/b54424316410803da5d67fe7315ee931b1c84035/routers/api/v1/api.go#L1262). According to the Go's memory profile the regexp created for these two routes are combined 1MB. If we want to avoid this allocation we understand now that the route must no longer be a regexp or a regexp that requires less allocated memory.[^5] As noted before, the heap profile does not contain the actual heap and thus it's not possible determine what the content of the heap was and analyze what was contained in the 1MB regexp objects.

-[^5]: Such type of approach involves also understanding the source code and behavior of another library to verify if this is possible and for this example understand why these two particular regexp routes allocated more memory than other regexp routes.
+[^5]: This approach involves also understanding the source code and behavior of another library to verify if this is possible and for this example understand why these two particular regexp routes allocated more memory than other regexp routes.

 ## Performance profiling

 When talking about profiling, it usually refers to performance profiling or, more precisely, profiling where the CPU is spending time in a time period. It should be noted that if profiling is done on a production instance, you will have no indication of which particular request or task contributed to the CPU time of a particular function; it's merely a sum of the time being spent in a function. If that information is needed, [tracing](#tracing) should be used.

-You can download the CPU profile from Forgejo on the stack trace page in the admin configuration page. The seconds value is important here; it specifies the time period for which you want to collect a CPU profile, and this value should be chosen somewhat carefully to avoid having too much data; keep it as low as possible. After having the `forgejo-diagnosis-xxx.zip` file, extract the `cpu-profile.dat` file. You can continue with the explanation of how to use the pprof tool from [the section in memory profiling](#pprof-tool); it is largely the same, but now instead of memory, the CPU time is being analyzed.
+You can download the CPU profile from Forgejo on the stacktrace page in the admin configuration page. The seconds value is important here; it specifies the time period for which you want to collect a CPU profile, and this value should be chosen somewhat carefully to avoid having too much data; keep it as low as possible. After having the `forgejo-diagnosis-xxx.zip` file, extract the `cpu-profile.dat` file. You can continue with the explanation of how to use the pprof tool from [the section in memory profiling](#pprof-tool); it is largely the same, but now instead of memory, the CPU time is being analyzed.

 Although Go provides a CPU profiler that is offered by Forgejo, in certain scenarios using a wall-clock profiler might give better results, where I/O and CPU time are mixed into one profile. Forgejo uses [felixge/fgprof](https://github.com/felixge/fgprof) to provide that. This profile is not available by default in the `forgejo-diagnosis-xxx.zip` file; it can only be retrieved by enabling it in the config `[server].ENABLE_PPROF = true` and then downloading `http://localhost:6060/debug/fgprof?seconds=5`. This must be fetched on the same machine where the Forgejo instance is running.