-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix crash due to interaction between distributed and config plugin #7504
Fix crash due to interaction between distributed and config plugin #7504
Conversation
Move the distributed plugin RocksDB metadata to a different domain which is not used or touched by the config plugin. Also do not acquire the pending queries multiple times just to count them. Acquire them once, use the obtained vector to count them and then use them to proceed executing the queries.
This has been reported to us and was happening rarely/around once a week. The crash happens here osquery/osquery/distributed/distributed.cpp Line 261 in 84f8b94
when it's trying to substring the query SQL value, which in the crash case is a nullptr. The immediate reason is that the array queries is empty and there's no check for it to be empty or not.
Looking at the rest of the logic though one notices that osquery/osquery/distributed/distributed.cpp Lines 127 to 128 in 84f8b94
Also note that both getPendingQueryCount and popRequest re-scan the database to get all the pending queries.
Due to this issue is then possible that something else deletes a key in between the Now about what is deleting the key, this is the Config plugin logic. osquery/osquery/config/config.cpp Line 919 in 84f8b94
This function has the job to go through all the keys in the So gluing all together here's what happens:
I went with changing the domain so that the distributed plugin and the config/scheduler plugin do not touch the same keys because the logic that there's behind the config purge and scheduler doesn't make sense for the distributed plugin, where the values there, are somewhat temporary already and they are stored in the database just to keep track of pending queries if osquery should die/crash in between executions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty simple to me. Don't know if @zwass or @theopolis want to comment
Move the distributed plugin RocksDB metadata to a different domain
which is not used or touched by the config plugin.
Also do not acquire the pending queries multiple times just to count
them. Acquire them once, use the obtained vector to count them
and then use them to proceed executing the queries.