Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Fix flaky test_daemon_sigint #7888

Merged
merged 1 commit into from
Dec 17, 2022

Conversation

Smjert
Copy link
Member

@Smjert Smjert commented Dec 14, 2022

The test is supposed to verify the correct and clean shutdown when osquery receives a SIGINT.
The clean shutdown though, is only possible when the signal handler is installed, which might not happen immediately,
so there might be a window where if the test sends a SIGINT it will immediately interrupt the osquery process with exit code -2, due to the signal, making the test fail.

Originally the test was waiting for the pidfile to be created by the process, because at that point the signal handler was installed. But a regression was introduced and the pidfile path was created by the test itself before starting osquery,
causing the wait to not work properly.

This restores that logic by removing the creation of that path in the test.
We also ensure that there's no pidfile created by another test.

Fixes #7718

@Smjert Smjert added bug flaky test A unit test that sometimes or on some systems does not return success, when it should labels Dec 14, 2022
@Smjert Smjert requested review from a team as code owners December 14, 2022 18:45
The test is supposed to verify the correct and clean shutdown
when osquery receives a SIGINT.
The clean shutdown though, is only possible when the signal handler
is installed, which might not happen immediately,
so there might be a window where if the test sends a SIGINT
it will immediately interrupt the osquery process with exit code -2,
due to the signal, making the test fail.

Originally the test was waiting for the pidfile to be created
by the process, because at that point the signal handler was installed.
But a regression was introduced and the pidfile path was created
by the test itself before starting osquery,
causing the wait to not work properly.

This restores that logic by removing the creation of that path
in the test.
We also ensure that there's no pidfile created by another test.
@Smjert Smjert force-pushed the stefano/test/fix-daemon-sigint branch from ea09ad6 to ba4a65f Compare December 14, 2022 18:51
@mike-myers-tob mike-myers-tob added the ready for review Pull requests that are ready to be reviewed by a maintainer label Dec 16, 2022
@mike-myers-tob
Copy link
Member

the pidfile path was created by the test itself before starting osquery

I was trying to understand why it would do that, but I will assume it was just a mistake.

The fix looks good to me though!

@Smjert
Copy link
Member Author

Smjert commented Dec 16, 2022

I was trying to understand why it would do that, but I will assume it was just a mistake.

The fix looks good to me though!

Yeah I think there has been a misconception when I was trying to explain the need for having that wait to Alessandro.
The osquery logic for the pidfile handles both the situations where either the pidfile misses or it exists (signaling there's another osqueryd processes running), and optionally containing the pid of the osqueryd process.
But its presence should have nothing to do with the sigint test.. meaning, we are not testing the pidfile check feature and uniqueness of the osqueryd process currently running; but it's needed for it to be missing, due to that "hack", to ensure that a signal handler has been installed.

@mike-myers-tob
Copy link
Member

It looks like this is currently what's failing the CI on master

@Smjert
Copy link
Member Author

Smjert commented Dec 16, 2022

It looks like this is currently what's failing the CI on master

Yeah this is a flaky test that often fails; there's another one which often fails on macOS, which I want to fix next: #7433

@Smjert Smjert merged commit 7734da4 into osquery:master Dec 17, 2022
@Smjert Smjert deleted the stefano/test/fix-daemon-sigint branch December 17, 2022 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug flaky test A unit test that sometimes or on some systems does not return success, when it should ready for review Pull requests that are ready to be reviewed by a maintainer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_osqueryd.py test_daemon_sigint flaky test
3 participants