Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom cpu limit duration for the watchdog #7348

Merged
merged 1 commit into from
Nov 3, 2021

Conversation

nyanshak
Copy link
Contributor

@nyanshak nyanshak commented Oct 15, 2021

[WIP] - Took a stab at implementing #7212. This functionality had some inconsistencies in the docs (around percentages, times, etc.), so I'm still not incredibly confident that this is correct.

I'd like to get some early feedback / corrections to make from maintainers.

Fixes: #7212

* Add `--watchdog_latency_limit` flag to allow customizing amount of
  time osquery is allowed to spend over the cpu utilization limit.
@nyanshak nyanshak requested review from a team as code owners October 15, 2021 22:51
@@ -102,10 +102,20 @@ If this value is >0 then the watchdog level (`--watchdog_level`) for maximum mem

`--watchdog_utilization_limit=0`

If this value is >0 then the watchdog level (`--watchdog_level`) for maximum sustained CPU utilization is overridden. Use this if you would like to allow the `osqueryd` process to use more than 30% of a thread for more than 9 seconds of wall time. The length of sustained utilization is not independently configurable.
If this value is >0 then the watchdog level (`--watchdog_level`) for maximum sustained CPU utilization is overridden. Use this if you would like to allow the `osqueryd` process to use more than 10% of a thread for more than `--watchdog_latency_limit` seconds of wall time. The length of sustained utilization is configurable with `--watchdog_latency_limit`.

This value is a maximum number of CPU cycles counted as the `processes` table's `user_time` and `system_time`. The default is 90, meaning less 90 seconds of cpu time per 3 seconds of wall time is allowed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line, specifically "default is 90, meaning less 90 seconds of cpu time per 3 seconds of wall time", didn't make a lot of sense to me. I don't know where 90 comes from, or the 3 seconds. I have currently left this unchanged, but I feel like it needs to be corrected / updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@directionless - hey, since you reviewed the rest of this... do you have enough context to answer this? I didn't really understand this part of the implementation, and I think this should be clarified / corrected.

@mike-myers-tob mike-myers-tob added feature performance ready for review Pull requests that are ready to be reviewed by a maintainer labels Oct 25, 2021
@@ -102,10 +102,20 @@ If this value is >0 then the watchdog level (`--watchdog_level`) for maximum mem

`--watchdog_utilization_limit=0`

If this value is >0 then the watchdog level (`--watchdog_level`) for maximum sustained CPU utilization is overridden. Use this if you would like to allow the `osqueryd` process to use more than 30% of a thread for more than 9 seconds of wall time. The length of sustained utilization is not independently configurable.
If this value is >0 then the watchdog level (`--watchdog_level`) for maximum sustained CPU utilization is overridden. Use this if you would like to allow the `osqueryd` process to use more than 10% of a thread for more than `--watchdog_latency_limit` seconds of wall time. The length of sustained utilization is configurable with `--watchdog_latency_limit`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be fine, but it also seems a little weird?

If the value isn't conforming (like, if it's under 0) we should throw an error, and not silently ignore it.

And it feels weird to have a command line that defaults to 0, and then have some other default somewhere. Why not have the command line default to what we want? (there might be history behind it, but it feels byzantine)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's maybe less weird in context, because there is a different default value depending on --watchdog_level, so there isn't really a single "default" value.

For example, "normal" CPU utilization limit is 10%, restrictive is 5%, and off is 100%.

Now... maybe the concept of "watchdog level" is outdated and could be replaced by having defaults for each of these values instead? But as is, it's pretty reasonable to have this set to 0 to distinguish between "use the value set by this flag" and "use the value for the configured watchdog level".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this is weird, but it's consistent with the other flags. The weirdness is not net new. :P

@directionless directionless changed the title Issue #7212: Allow custom cpu limit duration Allow custom cpu limit duration for the watchdog Nov 1, 2021
@theopolis theopolis merged commit d278287 into osquery:master Nov 3, 2021
@nyanshak nyanshak deleted the watchdog-latency-limit branch November 3, 2021 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature performance ready for review Pull requests that are ready to be reviewed by a maintainer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provides option to customize the CPU usage limit duration time
4 participants