Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Folders inside a forensic image with more than 1 million files being ignored #2136

Open
wladimirleite opened this issue Mar 19, 2024 · 18 comments
Labels
bug dependencies Pull requests that update a dependency file

Comments

@wladimirleite
Copy link
Member

This was found by @hugohmk and I was able to reproduce it.
A folder with one million files in a NTFS (inside a E01) was silently ignored (no error, but its content was not added to the case).
A similar folder, with 600,000 files, was processed correctly.

To identify the problem's cause, I ran the standalone "tsk_loaddb.exe" and observed the same behavior, using TSK 4.12.0 and 4.12.1.
However, when I tried 4.11.1 (the version before 4.12.0), it worked fine.

Having such a huge number of files in the same folder should be an extremely unusual situation, but it is possible and allowed by NTFS.

@lfcnassif
Copy link
Member

Well, unfortunately, that was an intentional Sleuthkit change... I'll search for the related issue and post it here.

@lfcnassif
Copy link
Member

sleuthkit/sleuthkit#2787

I agree with those stating it should at least be configurable, e.g. with an environment variable parameter.

We can patch our Sleuthkit fork to implement that, I'm not happy to maintain a fork...

@lfcnassif
Copy link
Member

@wladimirleite, while running your tests, did you see big memory consumption issues or excessive slowness?

@wladimirleite
Copy link
Member Author

@wladimirleite, while running your tests, did you see big memory consumption issues or excessive slowness?

Using 4.11.1, tsk_loaddb.exe took a while (~40 minutes), but I didn't monitor memory.
With IPED processing (and tsk_loaddb.exe 4.12.0), it was very fast (only a minute or so), but the folder and all its files were ignored.

@wladimirleite
Copy link
Member Author

I agree with those stating it should at least be configurable, e.g. with an environment variable parameter.

Indeed. And maybe an option to abort with an error, instead of just silently ignoring.

We can patch our Sleuthkit fork to implement that, I'm not happy to maintain a fork...

I don't think it is worth the effort to maintain a fork.
Since @hugohmk provided a E01 with a folder with 3 million files, I will try to raise the hardcoded limit in TSK and see what happens in terms of memory consumption and slowness.
As I said, this should be very rare, but once we try to process a disk image with such a condition, it can be very dangerous, as a lot of files will be ignored without any clear warning.

@lfcnassif
Copy link
Member

lfcnassif commented Mar 20, 2024

As I said, this should be very rare, but once we try to process a disk image with such a condition, it can be very dangerous, as a lot of files will be ignored without any clear warning.

Has tsk_loaddb.exe 4.12+ written some warning/error in the Console? If yes, was that logged in IPED logs? PS: Today we use the java bindings, which output can be different than tsk_loaddb, we try to detect critical errors from TSK java binding output, log some of them and redirect others to IPED console, maybe we are failing in the error detection part...

@lfcnassif
Copy link
Member

Using 4.11.1, tsk_loaddb.exe took a while (~40 minutes), but I didn't monitor memory.

Well, I have seen a few cases taking a similar or more time in tsk_loaddb to decode the FS tree, not sure if they are related to this issue...

@wladimirleite
Copy link
Member Author

No output in the console using tsk_loaddb.
I later saw some messages using its verbose command line option, but it generated several GB of log, so it wasn't possible to identify what was going on.

@lfcnassif lfcnassif added the dependencies Pull requests that update a dependency file label Mar 20, 2024
@lfcnassif
Copy link
Member

@markmckinnon if we send a PR to Sleuthkit making that hard coded limit configurable, will it be reviewed?

@lfcnassif lfcnassif changed the title Folders (inside a forensic image) with a huge number of files may be ignored Folders (inside a forensic image) with more than 1 million files being ignored ignored Mar 20, 2024
@lfcnassif lfcnassif changed the title Folders (inside a forensic image) with more than 1 million files being ignored ignored Folders (inside a forensic image) with more than 1 million files being ignored Mar 20, 2024
@lfcnassif lfcnassif changed the title Folders (inside a forensic image) with more than 1 million files being ignored Folders inside a forensic image with more than 1 million files being ignored Mar 20, 2024
@markmckinnon
Copy link

@lfcnassif it should be reviewed and implemented as this can also effect Autopsy as well.

@lfcnassif
Copy link
Member

Thanks @markmckinnon, we will try to propose a PR to Sleuthkit repo then.

@lfcnassif
Copy link
Member

Do you think using an environment variable, like TSK_MAX_ENTRIES_PER_FOLDER is a good approach? If not set, keep current limit. If set, use the configured value. If set to -1, disable the limit. Looks fine?

@markmckinnon
Copy link

So each tool that wants to make a new limit or no limit will have to set an environment variable in what ever OS they are running in instead of using a command line argument?

@lfcnassif
Copy link
Member

Yes, that would work for us, and minor code changes would be needed, with no TSK API or internal data structures changes. But I understand that would be a global system configuration that would affect all TSK depending tools, and maybe different tools would need a different configuration...

@lfcnassif lfcnassif added this to To do in 4.2 via automation Apr 2, 2024
@lfcnassif
Copy link
Member

@wladimirleite did this happen with a real world image or the triggering image was an artificial one?

@wladimirleite
Copy link
Member Author

I confirmed with @hugohmk a while ago, it was in an artificial image (generated to test another unrelated possible performance issue).
I planned to investigate it a bit more, maybe suggesting a better solution inside TSK, but couldn't find time so far, as I am not familiar with its code.

@lfcnassif
Copy link
Member

lfcnassif commented May 2, 2024

I can implement the environment variable workaround in our fork, but I'm asking to check the priority of this fix, if it should go into 4.2.0 or not, so seems it is not critical...

@wladimirleite
Copy link
Member Author

No, I don't think it is critical.

@lfcnassif lfcnassif removed this from To do in 4.2 May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug dependencies Pull requests that update a dependency file
Projects
None yet
Development

No branches or pull requests

3 participants