Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix framework processes cleanup #23222

Merged
merged 3 commits into from
May 21, 2024
Merged

Fix framework processes cleanup #23222

merged 3 commits into from
May 21, 2024

Conversation

GGP1
Copy link
Member

@GGP1 GGP1 commented May 2, 2024

Related issue
Closes #22608

Description

Checks that a process belongs to the Wazuh daemon before killing it on the script start.

Logs/Alerts example

Terminating processes that belong to the daemon

API
root@wazuh-master:/# ls /var/ossec/var/run/
wazuh-agentlessd.start   wazuh-apid.start           wazuh-authd.start               wazuh-csyslogd.start  wazuh-execd.start           wazuh-modulesd.start    wazuh-remoted.state
wazuh-analysisd-179.pid  wazuh-apid_auth-248.pid    wazuh-clusterd-978.pid          wazuh-db-137.pid      wazuh-integratord.start     wazuh-monitord-339.pid  wazuh-syscheckd-188.pid
wazuh-analysisd.start    wazuh-apid_events-251.pid  wazuh-clusterd.start            wazuh-db.start        wazuh-logcollector-329.pid  wazuh-monitord.start    wazuh-syscheckd.start
wazuh-analysisd.state    wazuh-apid_exec-245.pid    wazuh-clusterd_child_0-997.pid  wazuh-dbd.start       wazuh-logcollector.start    wazuh-remoted-202.pid
wazuh-apid-85.pid        wazuh-authd-124.pid        wazuh-clusterd_child_1-998.pid  wazuh-execd-162.pid   wazuh-modulesd-354.pid      wazuh-remoted.start
root@wazuh-master:/# /var/ossec/bin/wazuh-apid -f
wazuh-apid: Orphan child process 251 was terminated.
wazuh-apid: Orphan child process 248 was terminated.
wazuh-apid: Orphan child process 245 was terminated.
wazuh-apid: Orphan child process 85 was terminated.
Starting API in foreground
2024/05/02 15:52:49 INFO: Checking RBAC database integrity...
2024/05/02 15:52:49 INFO: /var/ossec/api/configuration/security/rbac.db file was detected
2024/05/02 15:52:49 INFO: RBAC database integrity check finished successfully
2024/05/02 15:52:51 INFO: Listening on 0.0.0.0:55000..
2024/05/02 15:52:51 INFO: Getting installation UID...
2024/05/02 15:52:51 INFO: Getting updates information...
======== Running on https://0.0.0.0:55000 ========
(Press CTRL+C to quit)
Cluster
root@wazuh-master:/# /var/ossec/bin/wazuh-clusterd -f 
wazuh-clusterd: Orphan child process 978 was terminated.
wazuh-clusterd: Orphan child process 998 was terminated.
wazuh-clusterd: Orphan child process 997 was terminated.
Starting cluster in foreground (pid: 62833)
2024/05/02 17:17:04 INFO: [Local Server] [Main] Serving on /var/ossec/queue/cluster/c-internal.sock
2024/05/02 17:17:04 INFO: [Master] [Main] Serving on ('0.0.0.0', 1516)
2024/05/02 17:17:04 INFO: [Master] [Local integrity] Starting.
2024/05/02 17:17:04 INFO: [Master] [Local agent-groups] Sleeping 30s before starting the agent-groups task, waiting for the workers connection.
2024/05/02 17:17:04 INFO: [Master] [Local integrity] Finished in 0.109s. Calculated metadata of 34 files.

Creating file with the pid of a process from other service (wazuh-modulesd) and restarting the daemon.

Running processes
root@wazuh-master:/# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   4364  1580 ?        Ss   15:52   0:00 bash /scripts/entrypoint.sh wazuh-master master-node master
root          72  0.0  0.0   4628  2500 pts/0    Ss   15:52   0:00 bash
root         124  0.1  0.0 310804  3412 ?        Sl   15:52   0:00 /var/ossec/bin/wazuh-authd
wazuh        137  0.6  0.2 987964 12840 ?        Sl   15:52   0:00 /var/ossec/bin/wazuh-db
root         162  0.0  0.0  23880  1536 ?        Sl   15:52   0:00 /var/ossec/bin/wazuh-execd
wazuh        179  0.6  0.5 3145168 29244 ?       Sl   15:52   0:00 /var/ossec/bin/wazuh-analysisd
root         188  2.0  0.1 331856  7180 ?        SNl  15:52   0:02 /var/ossec/bin/wazuh-syscheckd
wazuh        202  0.0  0.2 1210224 13184 ?       Sl   15:52   0:00 /var/ossec/bin/wazuh-remoted
root         329  0.0  0.0 400832  4968 ?        Sl   15:52   0:00 /var/ossec/bin/wazuh-logcollector
wazuh        339  0.0  0.0  23860  2120 ?        Sl   15:52   0:00 /var/ossec/bin/wazuh-monitord
root         354 51.5  3.2 779180 177564 ?       Sl   15:52   0:55 /var/ossec/bin/wazuh-modulesd
wazuh        978  0.2  0.9 281688 52000 ?        Sl   15:52   0:00 /var/ossec/framework/python/bin/python3 /var/ossec/framework/scripts/wazuh_clusterd.py
wazuh        997  0.0  0.8 134224 47280 ?        S    15:52   0:00 /var/ossec/framework/python/bin/python3 /var/ossec/framework/scripts/wazuh_clusterd.py
wazuh        998  0.0  0.8 134224 47496 ?        S    15:52   0:00 /var/ossec/framework/python/bin/python3 /var/ossec/framework/scripts/wazuh_clusterd.py
wazuh       2548 14.3  1.9 784560 107904 ?       Sl   15:54   0:02 /var/ossec/framework/python/bin/python3 /var/ossec/api/scripts/wazuh_apid.py
wazuh       2549  0.0  1.0 140676 57776 ?        S    15:54   0:00 /var/ossec/framework/python/bin/python3 /var/ossec/api/scripts/wazuh_apid.py
wazuh       2552  0.0  1.0 222604 58076 ?        S    15:54   0:00 /var/ossec/framework/python/bin/python3 /var/ossec/api/scripts/wazuh_apid.py
wazuh       2555  0.0  1.0 435604 57948 ?        S    15:54   0:00 /var/ossec/framework/python/bin/python3 /var/ossec/api/scripts/wazuh_apid.py
root        2687  0.0  0.0   2792   972 ?        S    15:54   0:00 sleep 10
root        2749  0.0  0.0   7064  1580 pts/0    R+   15:54   0:00 ps aux
API
root@wazuh-master:/# touch /var/ossec/var/run/wazuh-apid-354.pid
root@wazuh-master:/# /var/ossec/bin/wazuh-apid -f
wazuh-apid: Orphan child process 2555 was terminated.
wazuh-apid: Orphan child process 2548 was terminated.
wazuh-apid: Orphan child process 2552 was terminated.
wazuh-apid: Orphan child process 2549 was terminated.
wazuh-apid: Process 354 does not belong to wazuh-apid, removing from /var/ossec/run...
Starting API in foreground
2024/05/02 15:54:56 INFO: Checking RBAC database integrity...
2024/05/02 15:54:56 INFO: /var/ossec/api/configuration/security/rbac.db file was detected
2024/05/02 15:54:56 INFO: RBAC database integrity check finished successfully
2024/05/02 15:54:58 INFO: Listening on 0.0.0.0:55000..
2024/05/02 15:54:58 INFO: Getting installation UID...
2024/05/02 15:54:58 INFO: Getting updates information...
======== Running on https://0.0.0.0:55000 ========
(Press CTRL+C to quit)
Cluster
root@wazuh-master:/# touch /var/ossec/var/run/wazuh-clusterd-354.pid
root@wazuh-master:/# /var/ossec/bin/wazuh-clusterd -f
wazuh-clusterd: Orphan child process 63149 was terminated.
wazuh-clusterd: Orphan child process 63150 was terminated.
wazuh-clusterd: Process 354 does not belong to wazuh-clusterd, removing from /var/ossec/run...
wazuh-clusterd: Orphan child process 63148 was terminated.
Starting cluster in foreground (pid: 63286)
2024/05/02 17:17:41 INFO: [Local Server] [Main] Serving on /var/ossec/queue/cluster/c-internal.sock
2024/05/02 17:17:41 INFO: [Master] [Main] Serving on ('0.0.0.0', 1516)
2024/05/02 17:17:41 INFO: [Master] [Local integrity] Starting.
2024/05/02 17:17:41 INFO: [Master] [Local agent-groups] Sleeping 30s before starting the agent-groups task, waiting for the workers connection.
2024/05/02 17:17:42 INFO: [Master] [Local integrity] Finished in 0.107s. Calculated metadata of 34 files.

Note

Cluster AIT is failing because of #23195, the fix was merged in 4.8.0 but didn't reach 4.9.0 yet.

@GGP1 GGP1 self-assigned this May 2, 2024
@GGP1 GGP1 force-pushed the fix/22608-processes-cleanup branch 4 times, most recently from 1734003 to db5d1a3 Compare May 2, 2024 18:22
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the name to be consistent with the rest of the files and to avoid doing ugly things in clean_pid_files()

Copy link
Member

@nico-stefani nico-stefani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Job!
Two things to note:

  • Can you check if the failure of the AIT is related to these changes?
  • With the rename of wazuh_apid.py, would it be necessary to make changes to wazuh-jenkins?

@GGP1
Copy link
Member Author

GGP1 commented May 7, 2024

  • Can you check if the failure of the AIT is related to these changes?

The failures are related to #23195.

  • With the rename of wazuh_apid.py, would it be necessary to make changes to wazuh-jenkins?

Good catch, I opened https://github.com/wazuh/wazuh-jenkins/pull/6472 to add it.

Copy link
Member

@nico-stefani nico-stefani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

Copy link
Member

@Selutario Selutario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good. However, modifying the name of one of the processes usually has many implications, more than it may seem at first. For example, these checks are failing because wazuh-apid.py was not found:

Processing files: wazuh-manager-4.9.0-0.commit50e245f.x86_64
error: File not found: /build_wazuh/rpmbuild/BUILDROOT/wazuh-manager-4.9.0-0.commit50e245f.x86_64/var/ossec/api/scripts/wazuh-apid.py
RPM build errors:
    File not found: /build_wazuh/rpmbuild/BUILDROOT/wazuh-manager-4.9.0-0.commit50e245f.x86_64/var/ossec/api/scripts/wazuh-apid.py

Probably, in addition to the changes in qa-integration-framework, you should also update some tools that still exist in wazuh-qa such as metrics collection. At least, it was done when wazuh-clusterd was renamed:

We should go through all the other repositories and also notify the cloud team in case they have healthchecks looking for the old process name.

All of this could go beyond the initial scope of the issue. Consider opening a new issue for it.

@GGP1 GGP1 force-pushed the fix/22608-processes-cleanup branch from 50e245f to 18cb803 Compare May 17, 2024 12:45
@Selutario Selutario merged commit 9e47148 into 4.9.0 May 21, 2024
39 of 44 checks passed
@Selutario Selutario deleted the fix/22608-processes-cleanup branch May 21, 2024 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wazuh-apid blindly killing pid (found in wazuh-apid.pid) after a sudden reboot
3 participants