Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ETW-based process events table for Windows #7821

Merged
merged 29 commits into from
Feb 7, 2023

Conversation

zwass
Copy link
Member

@zwass zwass commented Nov 17, 2022

This PR implements POTE (see Blueprint) and the new evented table process_etw_events.

Blueprint: #7826

This work done by @marcosd4h. I just opened the PR.

@zwass zwass requested review from a team as code owners November 17, 2022 17:25
Copy link
Member Author

@zwass zwass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only took a quick look so far (mostly at the schema).

Very glad to see the thorough comments on all the ETW "framework" header files! Thank you for doing that.

After we get this initial work merged into osquery core, it would be really helpful if you could write up some documentation in the osquery repo about how further ETW tables should be built using the framework.

table_name("etw_process_events")
description("Windows process execution events.")
schema([
Column("type", TEXT, "Event Type"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would help to see the possible values. Is this right?

Suggested change
Column("type", TEXT, "Event Type"),
Column("type", TEXT, "Event Type (ProcessStart, ProcessStop)"),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! I think having the possible types listed here will help

Comment on lines 20 to 22
Column("header_pid", BIGINT, "Process ID of the process reporting the event", hidden=True),
Column("process_sequence_number", BIGINT, "Process Sequence Number - Present only on ProcessStart events", hidden=True),
Column("parent_process_sequence_number", BIGINT, "Parent Process Sequence Number - Present only on ProcessStart events", hidden=True),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these columns hidden because they are mostly useful for dev/debuggign?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the process_sequence_number unique at some level? Would be useful to have a non-PID unique identifier for a process

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A corner case scenario on Windows called "PID reuse" can happen when the same PIDs get reused during OS runtime lifecycle. This is something very unlikely that OS developers decided to address by including the concept of the "process sequence number" on the ETW fields. I'm including this data as a hidden value and exposing this information in the unlikely event this is needed.

@zwass yes, these could be useful for debugging scenarios

@defensivedepth yes, they are unique

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcosd4h How unique? Sysmon generates a GUID for each new process creation (https://learn.microsoft.com/en-us/sysinternals/downloads/sysmon#event-id-1-process-creation) specifically because of the issues around PIDs.

I would recommend that we do not hide the sequence number columns. For security use cases / correlation, some type of GUID would be better than PIDs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcosd4h Wanted to make sure my reply didn't get lost in all the other comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @defensivedepth, sorry for the late response! I totally miss your response.

This sequence value the code is exposing is generated by the kernel itself. The code is just grabbing it from the ETW event and exposing it to the table. In other words, the code does not control this value; it just relies on the kernel-generated value to be unique for the current os boot session.

libraries/cmake/source/krabsetw/config/krabs.cpp Outdated Show resolved Hide resolved
Copy link
Member

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real comments, but I'm excited to see it!

@@ -0,0 +1,29 @@
table_name("etw_process_events")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This isn't blocking)

Does it make sense to expose any of this in the existing process_events? I feel like we've never really had a good answer for how we handle that, and the migration between event subsytems. (process_events vs bpf_process_events vs es_process_events)

I guess maybe it makes more sense to let them all diverge, and people can write platform specific SQL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this yesterday with @zwass and other folks during the Osquery office hours. There is still not a clear decision on how to move fwd. We most likely go ahead with this change and then explore options to converge es_process_events, bpf_process_events, and etw_process_events into a single table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My takeaway from office hours was:

  • We should keep separate tables
  • Future work to explore a merged process_events
  • We should probably name them process_events_etw so like tables sort together

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed office hours, apologies, just my 2c here. I'm not a huge fan of diverging tables, as it puts the data in different places, and it's not nearly as intuitive as there being just one process_events table. The diverging schemas was what we had originally shipped OS specific columns for I thought.
I'm not going to stand in the way of progress here, just wanted to voice my thoughts on this.

Copy link
Contributor

@marcosd4h marcosd4h Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @muffins, thanks for providing your input here. During office-hours meeting we decided to merge es_process_events, bpf_process_events, and etw_process_events into a single table. There has to be some work involved to decide how to merge the schemas through os-specific columns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcosd4h thanks for the context! This is what I get for not coming to OH :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://hackmd.io/F9uUgvUkQuuyfrPoZbstYg?view#ETW-events-table--Process-events

I think we were more nuanced. IIRC the extended schema has some issues under the hood, and it's often hard to see how to combine things.

So we were mulling on the idea of having divergent tables, and also a single table with a simplified set of things. But we didn't think we should block here on that.

We did most think we should start naming them process_events_XXXX so they sorted together

Copy link
Contributor

@marcosd4h marcosd4h Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@directionless @zwass I ended up naming the table process_etw_events. The main rationale for picking this name instead of process_events_XXXX is that evented tables must have the _events suffix at the end of the table name

specs/windows/etw_process_events.table Outdated Show resolved Hide resolved
@mike-myers-tob mike-myers-tob added feature virtual tables events Related to osquery's evented tables or eventing subsystem Windows labels Nov 22, 2022
@mike-myers-tob mike-myers-tob changed the title Feature/windows process events Add ETW-based process events table for Windows Nov 22, 2022
@mike-myers-tob
Copy link
Member

About the failing Windows test:

81: Test command: D:\a\osquery\osquery\w\install\cmake-3.21.4-windows-x86_64\bin\cmake.exe "-E" "env" "PYTHONPATH=D:/a/osquery/osquery/w/build/python_path" "C:/hostedtoolcache/windows/Python/3.11.0/x64/python3.exe" "-u" "test_release.py" "--build_dir=D:/a/osquery/osquery/w/build" "--verbose"
81: Test timeout computed to be: 300
81: test_linked_system_libraries (__main__.ReleaseTests.test_linked_system_libraries) ... FAIL
81: test_no_nonsystem_link (__main__.ReleaseTests.test_no_nonsystem_link) ... skipped 'Test for Darwin and Linux only'
81: 
81: ======================================================================
81: FAIL: test_linked_system_libraries (__main__.ReleaseTests.test_linked_system_libraries)
81: ----------------------------------------------------------------------
81: Traceback (most recent call last):
81:   File "D:\a\osquery\osquery\w\build\tools\tests\test_release.py", line 155, in test_linked_system_libraries
81:     self.fail(
81: AssertionError: Found these additional unwanted libraries linked:
81:     tdh.dll
81: 
81: ----------------------------------------------------------------------
81: Ran 2 tests in 0.065s
81: 
81: FAILED (failures=1, skipped=1)
81/82 Test #81: tools_tests_testrelease ..........................................***Failed    0.18 sec

@lcostantino
Copy link

Hi @zwass , excellent work!, BTW i had a POC for the same, but using a different provider (not the kernel one) with further modifications to share code between process_events and processes while getting information not present in ETW event.

In case you are interested to add some improvements or fields in a future PR, feel free to use some of those changes. (lcostantino#1)

@marcosd4h
Copy link
Contributor

Hi @zwass , excellent work!, BTW i had a POC for the same, but using a different provider (not the kernel one) with further modifications to share code between process_events and processes while getting information not present in ETW event.

In case you are interested to add some improvements or fields in a future PR, feel free to use some of those changes. (lcostantino#1)

@lcostantino Flaco!! mann, it's really a small world, after all! I'm happy to see you here. Also, glad to see that you are thinking that adding ETW support to Osquery is the way to go. Would it be possible for you to code review this PR if you have a chance? I agree that there are things in your PoC that can be used to enrich the process events created on this PR. I also created a framework to add new ETW-based Event publishers, described in the blueprint [here]. (#7826). Your feedback on the blueprint will be greatly appreciated. Are you on the Osquery slack by chance?

@marcosd4h
Copy link
Contributor

Small clarification here: @zwass created this PR by mistake while we were reviewing the initial specs for this new evented table. That's why there are confusing details in the PR description.

This PR introduces POTE (Programmable OS Tracing Engine) framework + a new windows evented table called etw_process_events which is built on top of POTE. The main purpose of this new evented table is to audit process create/termination details.

Having POTE in place will simplify the addition of future evented tables as POTE provides a simplified mechanism to create ETW-based Event publishers.

The PR blueprint is detailed here.

@lcostantino
Copy link

Hi @zwass , excellent work!, BTW i had a POC for the same, but using a different provider (not the kernel one) with further modifications to share code between process_events and processes while getting information not present in ETW event.
In case you are interested to add some improvements or fields in a future PR, feel free to use some of those changes. (lcostantino#1)

@lcostantino Flaco!! mann, it's really a small world, after all! I'm happy to see you here. Also, glad to see that you are thinking that adding ETW support to Osquery is the way to go. Would it be possible for you to code review this PR if you have a chance? I agree that there are things in your PoC that can be used to enrich the process events created on this PR. I also created a framework to add new ETW-based Event publishers, described in the blueprint [here]. (#7826). Your feedback on the blueprint will be greatly appreciated. Are you on the Osquery slack by chance?

Hey Marcos, didn't knew it was you!!! happy to see you, sure i will check the PR. (not in OSQUERY slack yet)

T pop() {
std::unique_lock<std::mutex> lock(mutex_);
while (queue_.empty()) {
condition_.wait(lock);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be safer to use wait with predicate for empty? (spurious wake should also be covered with that)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! Thanks for bringing it up. I've just modified the wait call to use a predicate to avoid lost or spurious wakeups

@Smjert
Copy link
Member

Smjert commented Dec 12, 2022

@marcosd4h for the CVEs of krabsetw, the CI script will ask you to insert all fields for the library, but I checked in the NVD database and there's no CPE for the library.
You might want to check this new page and section: https://osquery.readthedocs.io/en/latest/development/cve-scan/#libraries-without-a-cpe

I think I need to improve the very first message when the library is not found, so that it points to the instructions for adding a new library and handling special cases.

@swannman
Copy link

swannman commented Dec 12, 2022

@marcosd4h for the CVEs of krabsetw, the CI script will ask you to insert all fields for the library, but I checked in the NVD database and there's no CPE for the library.

Good catch @Smjert - I'll start a thread with NIST per this guidance to get a CPE created for krabsetw. Thanks for pointing this out!

@Smjert
Copy link
Member

Smjert commented Dec 12, 2022

Good catch @Smjert - I'll start a thread with NIST per this guidance to get a CPE created for krabsetw. Thanks for pointing this out!

Hello @swannman and thanks! That's definitely useful!

@marcosd4h
Copy link
Contributor

marcosd4h commented Dec 12, 2022

@marcosd4h for the CVEs of krabsetw, the CI script will ask you to insert all fields for the library, but I checked in the NVD database and there's no CPE for the library. You might want to check this new page and section: https://osquery.readthedocs.io/en/latest/development/cve-scan/#libraries-without-a-cpe

I think I need to improve the very first message when the library is not found, so that it points to the instructions for adding a new library and handling special cases.

Thanks @Smjert, for pointing this out. I updated the branch to get the new CI scripts and updated the library manifest file. BTW, the version field was required and KrabsETW library does not have tagged released versions, so I ended up just using 0.0.0.0. Let me know if that works here.

@Smjert
Copy link
Member

Smjert commented Dec 12, 2022

Thanks @Smjert, for pointing this out. I updated the branch to get the new CI scripts and updated the library manifest file. BTW, the version field was required and KrabsETW library does not have tagged released versions, so I ended up just using 0.0.0.0 here. Let me know if that works here.

Yeah the version is mostly for human readers when there's no CPE, but if there's no version it's fine too, the commit sha is actually what gets compared in the repo.

Copy link
Contributor

@muffins muffins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presuming you've changed the items you've noted, this is largely looking good to me. As mentioned I'd really love to have another C++ human look through, esp for potential design improvements, but overall I think it's fine.

I'm happy to stamp if needed, but I'm going to let other folks have a chance to look through. If no one else steps up ping me and I'm happy to approve.

case EtwProviderConfig::EtwKernelProviderType::File: {
// https://learn.microsoft.com/en-us/windows/win32/etw/fileio
kernelProvider = std::make_shared<krabs::kernel::file_io_provider>();
} break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think you want these breaks inside of the curlys, keep all the code for each case together inside.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems easier to read to me as well. Any reason not to do it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point; I've just fixed that.

@@ -0,0 +1,29 @@
table_name("etw_process_events")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcosd4h thanks for the context! This is what I get for not coming to OH :)

@muffins muffins dismissed their stale review December 15, 2022 04:59

Stale review.

@marcosd4h
Copy link
Contributor

Hey @muffins, I hope you are doing great! I wanted to let you know that I've just pushed integration and events tests for the ETW logic. I've also included tests for the concurrent queue code. Adding tests was a great idea as it helped me to improve the publisher shutdown logic. Thanks for bringing this up! Also, sorry for the delay in getting the tests in place; I've been buried with high-priority work at FleetDM and didn't have the time the last few weeks to work on this.

@muffins
Copy link
Contributor

muffins commented Jan 7, 2023

@marcosd4h no stress on the time delays, it was also holidays so I wasn't looking at computers much! I'll try and take another look through everything over the next week or two. Hopefully we can get this merged up soon. Thanks a ton for bearing with us and writing tests, I know everyone will be pumped for the extra assurances 🙏 🥳 🧑‍🏭

@marcosd4h
Copy link
Contributor

@marcosd4h no stress on the time delays, it was also holidays so I wasn't looking at computers much! I'll try and take another look through everything over the next week or two. Hopefully we can get this merged up soon. Thanks a ton for bearing with us and writing tests, I know everyone will be pumped for the extra assurances 🙏 🥳 🧑‍🏭

Thanks @muffins, I hope you had a great start of the year!! I'm looking forward to the upcoming review. I'm open to work on any change that might be needed to get this merged 💯 🥳 🥳 🥳

Copy link
Member Author

@zwass zwass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcosd4h this is mostly nits and some requests for clarification.

This is tricky to read as such a huge PR, but it definitely helps that you documented many of the class members and methods. Thanks!


namespace osquery {

// Returns a reference to the single global EtwController instance
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Returns a reference to the single global EtwController instance
// Returns a reference to the single global EtwController instance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return instance;
}

// New events get store in the post-processing queue
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// New events get store in the post-processing queue
// New events get stored in the post-processing queue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

case EtwProviderConfig::EtwKernelProviderType::File: {
// https://learn.microsoft.com/en-us/windows/win32/etw/fileio
kernelProvider = std::make_shared<krabs::kernel::file_io_provider>();
} break;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems easier to read to me as well. Any reason not to do it?

void KernelEtwSessionRunnable::initKernelTraceSession(
const std::string& sessionName) {
if (sessionName.empty()) {
return;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

void KernelEtwSessionRunnable::stopKernelTraceSession(
const std::string& sessionName) {
if (sessionName.empty()) {
return;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

void EtwPublisherProcesses::initializeHardVolumeConversions() {
const std::string validDriveLetters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

for (auto& driveLetter : validDriveLetters) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment says done but looks like change wasn't pushed?

Comment on lines 401 to 404
processStartAggregationCache_.erase(it++);
} else {
++it;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same? Feels easier to understand for me to have the iterator advance outside the conditional

Suggested change
processStartAggregationCache_.erase(it++);
} else {
++it;
}
processStartAggregationCache_.erase(it);
}
++it;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, I think the code is more readable now. I'm going to try out this change and see how it goes

strlen(userNameStr.data()) == 0 ||
strlen(userNameStr.data()) >= MAX_PATH ||
sidType == SID_NAME_USE::SidTypeInvalid) {
// Inserting empty username to avoid the lookup logic to be called again
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this comment but I don't actually see the insert. Am I misunderstanding?

Maybe log here as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I've just added the missing insert here. This helper is used to populate an attribute of the resulting event, I'd prefer not to log here

Comment on lines 483 to 493
case TOKEN_ELEVATION_TYPE::TokenElevationTypeDefault: {
tokenInfo.assign("TokenElevationTypeDefault");
} break;

case TOKEN_ELEVATION_TYPE::TokenElevationTypeFull: {
tokenInfo.assign("TokenElevationTypeFull");
} break;

case TOKEN_ELEVATION_TYPE::TokenElevationTypeLimited: {
tokenInfo.assign("TokenElevationTypeLimited");
} break;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the same break in the curly braces syntax here please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

(ULONG)sizeof(EVENT_TRACE_PROPERTIES);

/// Best effort to stop ongoing trace session
ControlTraceA(NULL,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be nice to log the return value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, just fixed this

Copy link
Member Author

@zwass zwass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcosd4h thank you for addressing the comments!

This is looking ready to merge to me. I can't approve it since I opened the PR even though Marcos wrote all the code.

Copy link
Member

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A somewhat blind 👍, since @zwass would have, but PR authorship happened. Consider the magic co-authored-by silliness.

@zwass
Copy link
Member Author

zwass commented Feb 6, 2023

uhoh, GitHub wants to mark me as the author in the squash commit. Anyone know a way to resolve that? I didn't write a single line of this.

@directionless
Copy link
Member

uhoh, GitHub wants to mark me as the author in the squash commit. Anyone know a way to resolve that? I didn't write a single line of this.

Yes, that's how it works. The best you can do is Co-authored-by: Marcos Oviedo <marcos@fleetdm.com>

Could probably close this, and have Marcos open anew

@zwass
Copy link
Member Author

zwass commented Feb 6, 2023

Maybe I can squash this locally into a commit with Marcos' name on it and then we can "rebase and merge" that?

Marcos is out on Vacation this week so I'd rather not bother him to open another PR, and it would be nice to have the commit associated with all the comments here.

@directionless
Copy link
Member

Maybe I can squash this locally into a commit with Marcos' name on it and then we can "rebase and merge" that?

Marcos is out on Vacation this week so I'd rather not bother him to open another PR, and it would be nice to have the commit associated with all the comments here.

Hrm. That sounds a little messy? We can try, but I don't think we can roll it back.

@zwass
Copy link
Member Author

zwass commented Feb 7, 2023

I tried the things I could... adding Marcos as "co-author" and clarifying in the commit message seems to be the best we can do right now.

@zwass zwass merged commit 01d66ff into osquery:master Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
events Related to osquery's evented tables or eventing subsystem feature virtual tables Windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants