-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a race condition during the shutdown of the worker process #5943
Fix a race condition during the shutdown of the worker process #5943
Conversation
When a thread different from the main requests a shutdown through Initializer::requestShutdown, it should not call waitForShutdown; there's no reason to wait, moreover the function doesn't only wait, but also actually stops other components and then finally calls exit(). Since the main thread is already inside the waitForShutdown call waiting on Dispatcher::joinServices or inside the shutdown() callable on Windows, having a secondary thread do the same work potentially at the same time is wrong. Moreover calling exit() from a secondary thread is most of the time incorrect. The waitForShutdown function has been renamed to waitThenShutdown to better represent what it's actually doing.
The issue has been initially highlighted by running the test_osqueryd.py test, specifically the
An example is in this build https://dev.azure.com/trailofbits/osquery/_build/results?buildId=1254 With TSAN help I've found that when killing the parent process, the worker, in a different thread than the main, notices that the watchdog is dead so requests a shutdown through requestShutdown(). Lines 756 to 771 in 33e5fcb
which though not only waits, but also actually shutdowns various components. Moreover since Lines 727 to 741 in 33e5fcb
and since WatcherWatcherRunner is itself a service added to the Dispatcher, the thread will end up joining on itself.
Another issue, ignoring joining on itself, is that waitForShutdown ends up calling exit(), from a secondary thread. Other useful information is that the main thread calls Lines 97 to 112 in 33e5fcb
and waits on Dispatcher::joinServices, so the exit() will be called; it only needs to be waked up by something and then it will do the correct shutdown procedure. This is done already by requestShutdown(), which calls Dispatcher::stopServices(). |
Following the TSAN log, which shows that there's usage/destruction of the same object from two different threads:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the change in this PR, other than the naming clarification.
raise(SIGUSR1); | ||
} else { | ||
// The main thread is requesting a shutdown, meaning in almost every case | ||
// it is NOT waiting for a shutdown. | ||
// Exceptions include: tight request / wait in an exception handler or | ||
// custom signal handling. | ||
Dispatcher::stopServices(); | ||
waitForShutdown(); | ||
|
||
if (current_thread_id == kMainThreadId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this check superfluous? The else is only exercised if !(current_thread_id != kMainThreadId)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous if doesn't only check for the thread id, but also if FLAGS_enable_signal_handler
is enabled, which by default it's not.
This means that other threads can end up calling the else path, which caused the bug.
This is also shown by the TSAN callstack, where you have WatcherWatcherRunner entering in waitForShutdown through requestShutdown() and then calling exit().
When a thread different from the main requests a shutdown
through Initializer::requestShutdown, it should not call
waitForShutdown; there's no reason to wait, moreover the function
doesn't only wait, but also actually stops other components and then
finally calls exit().
Since the main thread is already inside the waitForShutdown call
waiting on Dispatcher::joinServices or inside the shutdown() callable
on Windows, having a secondary thread do
the same work potentially at the same time is wrong.
Moreover calling exit() from a secondary thread is most of the time
incorrect.
The waitForShutdown function has been renamed to waitThenShutdown
to better represent what it's actually doing.