Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve scheduled query denylisting and scheduler shutdown #7492

Conversation

Smjert
Copy link
Member

@Smjert Smjert commented Mar 1, 2022

  • Add support for handling a SIGUSR1 signal which will be used on posix
    platforms by the watcher to immediately signal the worker that a
    resource limit has been hit. Logic in the worker will prevent to
    remove trace of the fact that a query was running when the limit was
    hit, by avoiding clearing up the name of the executing query from the database.

  • Improve on the scheduler shutdown logic; even after receiving
    a shutdown request, the scheduler was first executing
    all the queries in the schedule and only then it was exiting.
    This was potentially causing unnecessary forced kills by the watchdog,
    which would result in denylisting queries that were not causing
    any resource limit to be hit, due to them not finishing in time.
    Check for the need to exit after each query execution instead.

  • Separate the permission tests from the watcher tests,
    since the first require root to be run.
    They will still not run as root since they currently fail
    under that user. This will be resolved later.

    Partially addresses Scheduled query that triggers the watchdog not always denylisted #7476

@Smjert
Copy link
Member Author

Smjert commented Mar 2, 2022

A way to reproduce the issue with scheduled queries denylisting, and therefore also test that this PR improves on it, is described here: #7476

To test the change on the scheduler shutdown instead, prepare a list of scheduled queries with a low interval, like 1 second, and the sum of the execution time of all those queries should be higher then 15 seconds.
Then start osquery as a daemon, and as soon as the first scheduled query runs, give a Ctrl+C, so that a request for a clean shutdown starts.
You will notice that instead of finishing executing the query and then exiting, it will continue executing all the queries and then the watchdog will be forced to send a kill to the worker, which will most likely denylist one of those queries.
The improvement made here makes so that if a shutdown request comes just before or during the execution of a query, after that query gets executed, then it will exit.

Finally for the third change I've also opened an issue to track remaining work: #7493

@mike-myers-tob mike-myers-tob added the ready for review Pull requests that are ready to be reviewed by a maintainer label Mar 2, 2022
@Smjert Smjert force-pushed the stefano/improvement/scheduled-query-denylist branch 2 times, most recently from a51cc38 to 473d2ca Compare March 19, 2022 10:35
@Smjert Smjert removed the ready for review Pull requests that are ready to be reviewed by a maintainer label Mar 19, 2022
- Add support for handling a SIGUSR1 signal which will be used on posix
  platforms by the watcher to immediately signal the worker that a
  resource limit has been hit. Logic in the worker will prevent to
  remove trace of the fact that a query was running when the limit was
  hit, by avoiding clearing up the name of the executing query from the database.

- Improve on the scheduler shutdown logic; even after receiving
  a shutdown request, the scheduler was first executing
  all the queries in the schedule and only then it was exiting.
  This was potentially causing unnecessary forced kills by the watchdog,
  which would result in denylisting queries that were not causing
  any resource limit to be hit, due to them not finishing in time.
  Check for the need to exit after each query execution instead.

- Separate the permission tests from the watcher tests,
  since the first require root to be run.
  They will still not run as root since they currently fail
  under that user. This will be resolved later.
@Smjert Smjert force-pushed the stefano/improvement/scheduled-query-denylist branch from 473d2ca to dff3b85 Compare April 15, 2022 16:10
Copy link
Member

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I didn't test this, both the intent and my read of the code seem fine.

@@ -155,9 +156,13 @@ void initWorkDirectories() {
void signalHandler(int num) {
int rc = 0;

if (num == SIGUSR1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might quibble over signal choices, something from setitimer, like SIGXCPU, feels appropriate. But 🤷 seems fine I guess?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a custom behavior that should only happen in a specific situation, I wanted to use a custom signal that isn't used by anything else with a different meaning.
If we had the watchdog communicating via socket to the worker, that would've been a packet instead.

@mike-myers-tob mike-myers-tob added this to the 5.4.0 milestone Apr 29, 2022
@Smjert Smjert merged commit 365babd into osquery:master Jun 1, 2022
@Smjert Smjert deleted the stefano/improvement/scheduled-query-denylist branch June 1, 2022 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants