-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve scheduled query denylisting and scheduler shutdown #7492
Improve scheduled query denylisting and scheduler shutdown #7492
Conversation
7d3dd39
to
20f27f3
Compare
A way to reproduce the issue with scheduled queries denylisting, and therefore also test that this PR improves on it, is described here: #7476 To test the change on the scheduler shutdown instead, prepare a list of scheduled queries with a low interval, like 1 second, and the sum of the execution time of all those queries should be higher then 15 seconds. Finally for the third change I've also opened an issue to track remaining work: #7493 |
a51cc38
to
473d2ca
Compare
- Add support for handling a SIGUSR1 signal which will be used on posix platforms by the watcher to immediately signal the worker that a resource limit has been hit. Logic in the worker will prevent to remove trace of the fact that a query was running when the limit was hit, by avoiding clearing up the name of the executing query from the database. - Improve on the scheduler shutdown logic; even after receiving a shutdown request, the scheduler was first executing all the queries in the schedule and only then it was exiting. This was potentially causing unnecessary forced kills by the watchdog, which would result in denylisting queries that were not causing any resource limit to be hit, due to them not finishing in time. Check for the need to exit after each query execution instead. - Separate the permission tests from the watcher tests, since the first require root to be run. They will still not run as root since they currently fail under that user. This will be resolved later.
473d2ca
to
dff3b85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I didn't test this, both the intent and my read of the code seem fine.
@@ -155,9 +156,13 @@ void initWorkDirectories() { | |||
void signalHandler(int num) { | |||
int rc = 0; | |||
|
|||
if (num == SIGUSR1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might quibble over signal choices, something from setitimer
, like SIGXCPU
, feels appropriate. But 🤷 seems fine I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a custom behavior that should only happen in a specific situation, I wanted to use a custom signal that isn't used by anything else with a different meaning.
If we had the watchdog communicating via socket to the worker, that would've been a packet instead.
Add support for handling a SIGUSR1 signal which will be used on posix
platforms by the watcher to immediately signal the worker that a
resource limit has been hit. Logic in the worker will prevent to
remove trace of the fact that a query was running when the limit was
hit, by avoiding clearing up the name of the executing query from the database.
Improve on the scheduler shutdown logic; even after receiving
a shutdown request, the scheduler was first executing
all the queries in the schedule and only then it was exiting.
This was potentially causing unnecessary forced kills by the watchdog,
which would result in denylisting queries that were not causing
any resource limit to be hit, due to them not finishing in time.
Check for the need to exit after each query execution instead.
Separate the permission tests from the watcher tests,
since the first require root to be run.
They will still not run as root since they currently fail
under that user. This will be resolved later.
Partially addresses Scheduled query that triggers the watchdog not always denylisted #7476