Allow custom cpu limit duration for the watchdog #7348

nyanshak · 2021-10-15T22:51:18Z

[WIP] - Took a stab at implementing #7212. This functionality had some inconsistencies in the docs (around percentages, times, etc.), so I'm still not incredibly confident that this is correct.

I'd like to get some early feedback / corrections to make from maintainers.

Fixes: #7212

* Add `--watchdog_latency_limit` flag to allow customizing amount of time osquery is allowed to spend over the cpu utilization limit.

nyanshak · 2021-10-15T22:52:46Z

docs/wiki/installation/cli-flags.md

@@ -102,10 +102,20 @@ If this value is >0 then the watchdog level (`--watchdog_level`) for maximum mem

 `--watchdog_utilization_limit=0`

-If this value is >0 then the watchdog level (`--watchdog_level`) for maximum sustained CPU utilization is overridden. Use this if you would like to allow the `osqueryd` process to use more than 30% of a thread for more than 9 seconds of wall time. The length of sustained utilization is not independently configurable.
+If this value is >0 then the watchdog level (`--watchdog_level`) for maximum sustained CPU utilization is overridden. Use this if you would like to allow the `osqueryd` process to use more than 10% of a thread for more than `--watchdog_latency_limit` seconds of wall time. The length of sustained utilization is configurable with `--watchdog_latency_limit`.

 This value is a maximum number of CPU cycles counted as the `processes` table's `user_time` and `system_time`. The default is 90, meaning less 90 seconds of cpu time per 3 seconds of wall time is allowed.


This line, specifically "default is 90, meaning less 90 seconds of cpu time per 3 seconds of wall time", didn't make a lot of sense to me. I don't know where 90 comes from, or the 3 seconds. I have currently left this unchanged, but I feel like it needs to be corrected / updated.

@directionless - hey, since you reviewed the rest of this... do you have enough context to answer this? I didn't really understand this part of the implementation, and I think this should be clarified / corrected.

directionless · 2021-11-01T01:53:29Z

docs/wiki/installation/cli-flags.md

@@ -102,10 +102,20 @@ If this value is >0 then the watchdog level (`--watchdog_level`) for maximum mem

 `--watchdog_utilization_limit=0`

-If this value is >0 then the watchdog level (`--watchdog_level`) for maximum sustained CPU utilization is overridden. Use this if you would like to allow the `osqueryd` process to use more than 30% of a thread for more than 9 seconds of wall time. The length of sustained utilization is not independently configurable.
+If this value is >0 then the watchdog level (`--watchdog_level`) for maximum sustained CPU utilization is overridden. Use this if you would like to allow the `osqueryd` process to use more than 10% of a thread for more than `--watchdog_latency_limit` seconds of wall time. The length of sustained utilization is configurable with `--watchdog_latency_limit`.


This might be fine, but it also seems a little weird?

If the value isn't conforming (like, if it's under 0) we should throw an error, and not silently ignore it.

And it feels weird to have a command line that defaults to 0, and then have some other default somewhere. Why not have the command line default to what we want? (there might be history behind it, but it feels byzantine)

It's maybe less weird in context, because there is a different default value depending on --watchdog_level, so there isn't really a single "default" value.

For example, "normal" CPU utilization limit is 10%, restrictive is 5%, and off is 100%.

Now... maybe the concept of "watchdog level" is outdated and could be replaced by having defaults for each of these values instead? But as is, it's pretty reasonable to have this set to 0 to distinguish between "use the value set by this flag" and "use the value for the configured watchdog level".

Agree that this is weird, but it's consistent with the other flags. The weirdness is not net new. :P

Issue osquery#7212: Allow custom cpu limit duration

68856a9

* Add `--watchdog_latency_limit` flag to allow customizing amount of time osquery is allowed to spend over the cpu utilization limit.

nyanshak requested review from a team as code owners October 15, 2021 22:51

nyanshak commented Oct 15, 2021

View reviewed changes

mike-myers-tob added feature performance ready for review Pull requests that are ready to be reviewed by a maintainer labels Oct 25, 2021

theopolis approved these changes Oct 30, 2021

View reviewed changes

directionless reviewed Nov 1, 2021

View reviewed changes

directionless changed the title ~~Issue #7212: Allow custom cpu limit duration~~ Allow custom cpu limit duration for the watchdog Nov 1, 2021

theopolis merged commit d278287 into osquery:master Nov 3, 2021

nyanshak deleted the watchdog-latency-limit branch November 3, 2021 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow custom cpu limit duration for the watchdog #7348

Allow custom cpu limit duration for the watchdog #7348

nyanshak commented Oct 15, 2021 •

edited by directionless

nyanshak Oct 15, 2021

nyanshak Nov 2, 2021

directionless Nov 1, 2021

nyanshak Nov 2, 2021

theopolis Nov 3, 2021

Allow custom cpu limit duration for the watchdog #7348

Allow custom cpu limit duration for the watchdog #7348

Conversation

nyanshak commented Oct 15, 2021 • edited by directionless

nyanshak Oct 15, 2021

Choose a reason for hiding this comment

nyanshak Nov 2, 2021

Choose a reason for hiding this comment

directionless Nov 1, 2021

Choose a reason for hiding this comment

nyanshak Nov 2, 2021

Choose a reason for hiding this comment

theopolis Nov 3, 2021

Choose a reason for hiding this comment

nyanshak commented Oct 15, 2021 •

edited by directionless