Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdf with "pseudo" encryption #1196

Open
JAF84 opened this issue Mar 5, 2024 · 13 comments
Open

pdf with "pseudo" encryption #1196

JAF84 opened this issue Mar 5, 2024 · 13 comments

Comments

@JAF84
Copy link

JAF84 commented Mar 5, 2024

hello,

tested also with the latest release (1.3.0)

see attached PDF as sample, i have a lot of samples like this.

52n31op9ob2on.pdf

in this case the PDF is encrypted, but does not ask for a password an all images are visable by any pdf-viewer,
so some object are encrypted, but no special password is necessary to decrypt.

clamav is extracting every object of the PDF, but they are still encrypted, to useless to find anything usefill inside.
you can see the object with "clamscan --debug --leave-temps=yes --tempdir=1.tmp ..."

so of course clamav should also decrypt this files in order to scan the parts...

br johannes

@JAF84
Copy link
Author

JAF84 commented Mar 5, 2024

I belive this object means, that the KEY is saved in the PDF file...

36 0 obj
<<
/CF <</StdCF <</CFM /AESV2
/Length 16
/Type /CryptFilter>>>>
/EncryptMetadata false
/Filter /Standard
/Length 128
/O <12C8E19723067F3F573A569162793847A399164D0ABD07C378E264D04385DE6C>
/P -3904
/R 4
/StmF /StdCF
/StrF /StdCF
/U <8DB1952FBC37B941D71F1E81F508A629A50EDB71F2423300B31F50D70AF2A721>
/V 4

endobj
xref

@ragusaa
Copy link
Contributor

ragusaa commented Mar 12, 2024

Hi,

Thank you for the notifying us about this, and I am sorry for the delay in responding to you.

In looking at our metadata, this file is recognizing that there is an encrypted image that is decryptable, but appears to be being extracted without being decrypted.

According to pdfimages, this image is of type portable pixmap (ppm).

I am opening a ticket internally to track this issue, and get it scheduled for the future. We'll udpate this issue when it is scheduled.

If you could provide some of your other samples, we would appreciate it.

Thanks,
Andy

@JAF84
Copy link
Author

JAF84 commented Mar 13, 2024

hallo andy,

attached some more samples.

ce2kg7bptpo7e.pdf
v554fqz6krwme.pdf
7a2ljrwiskbk.pdf
eezo7xs89c.pdf
zikjqtdw51x5uuk.pdf

this are of course unwanted spam-pdf.

But there are also serious PDFs, which has this "pseudo-encrytion",
so using this "pdf-feature" does not globaly mean, that the PDF is bad one...

br johannes

@JAF84
Copy link
Author

JAF84 commented Mar 13, 2024

hello,

now i can tell you more about this, encryption is done when you protect the PDF e.g. for not-printable.

see samples attached

1.pdf 1.pdf => without encryption
2.pdf 2.pdf => encryption, but no password necessary => so could/should be checked...
3.pdf 3.pdf => encryption, password necessary

can be easiely created with pdftk on linux:
pdftk 1.pdf output 2.pdf owner_pw 1234
pdftk 1.pdf output 3.pdf owner_pw 1234 user_pw 4321

br johannes

@ragusaa
Copy link
Contributor

ragusaa commented Mar 13, 2024

That's great, thank you for the samples, and instructions on where this is coming from. We have some other pdf tasks planned, so hopefully we can get this addressed as part of that work.

Thanks,
Andy

@JAF84
Copy link
Author

JAF84 commented Mar 13, 2024

btw: also very interesting is that:

clamscan.exe --alert-encrypted=yes *pdf

1.pdf: OK
2.pdf: OK
3.pdf: Heuristics.Encrypted.PDF FOUND

so clamav already detects a difference between 2+3.pdf...

@ragusaa
Copy link
Contributor

ragusaa commented Mar 13, 2024

I haven't had a chance to play with the new files yet, but I would imagine 3.pdf would not have 'decrpytable' in the json output.

@ragusaa
Copy link
Contributor

ragusaa commented Mar 13, 2024

Just checked. 2.pdf is decryptable, 3.pdf is not.

@JAF84
Copy link
Author

JAF84 commented Mar 14, 2024

hello Andy,

i now also checked, clamav 1.30
when i do 2.pdf you are right, it shows "pdf_find_and_extract_objs: encrypted pdf found, decryptable!"

LibClamAV debug: cli_pdf: U: : a95f5a7083f9fb99bb158fcd70e503db00000000000000000000000000000000
LibClamAV debug: cli_pdf: O: : dd027d75bab3642ffd6d1b4a2020e2df0022ff603ae18bfb6769f36dd5800bfa
LibClamAV debug: check_owner_password: Unknown or unsupported encryption version. R: 3
LibClamAV debug: check_owner_password: encrypted PDF found but cannot decrypt with empty owner password
LibClamAV debug: cli_pdf: U: : a95f5a7083f9fb99bb158fcd70e503db00000000000000000000000000000000
LibClamAV debug: cli_pdf: O: : dd027d75bab3642ffd6d1b4a2020e2df0022ff603ae18bfb6769f36dd5800bfa
LibClamAV debug: cli_pdf: md5: f57ac02ebae3c6f4fd80ca480c0db974
LibClamAV debug: cli_pdf: Candidate encryption key: f57ac02ebae3c6f4fd80ca480c0db974
LibClamAV debug: cli_pdf: fileID: 2bc8cb8f258e5c34c306e9bdf5ac31e7
LibClamAV debug: cli_pdf: computed U (R>=3): a95f5a7083f9fb99bb158fcd70e503db
LibClamAV debug: check_user_password: user password is empty
LibClamAV debug: pdf_find_and_extract_objs: encrypted pdf found, decryptable!
LibClamAV debug: Bytecode executing hook id 258 (0 hooks)
LibClamAV debug: Bytecode: no logical signature matched, no bytecode executed
LibClamAV debug: pdf_find_and_extract_objs: (parsed hooks) returned 0

when i do --leave-temps=yes with 1.pdf there i see the "hello world" object in the tempfiles.

but with 2.pdf the extractred tempfiles are all still encrypted ... and so useless.
so it's not possible to create signatures of the PDF-parts...

i've now also tested and PDF with an image.
i used leave-temps to get the image file and created a hash-based signature of if.

the unencrypted file was marked as infected after that.
then i used " pdftk 1.pdf output 2.pdf owner_pw 1234" to encrypt.

clamav was telling me "decyptable", but did not mark the file as infected.

so maybe clamav is maybe able the deccypt it, but does not use the unencrypted parts for some reason?

br johannes

@ragusaa
Copy link
Contributor

ragusaa commented Mar 14, 2024

I think the 'LibClamAV debug: check_owner_password: Unknown or unsupported encryption version. R: 3' is the problem. When that statement is printed in our pdf parser, it does not attempt to decrypt that block, but the decryptable flag is printed because we should be able to decrypt.

We have some other planned work to do on the pdf parser, so hopefully we can get this implemented as part of that.

Thank you for digging into this!

@JAF84
Copy link
Author

JAF84 commented Mar 15, 2024

i have a lot of differnt samples here, but a lot of them contains bussiness data,
which i cannot post here.

but if there is some beta version to test, please let me know...

well anyway if the file has an encryption, like 2.pdf,
also if this only "for printing deny" and if clamav fails to decrypt
=> then it should also be marked as "Heuristics.Encrypted.PDF" or something simular...

because maybe there are also other encryptions, which clamav fails to decrypt
or maybe in the future the will be a new way to encrypt pdf files...

can you also think about this?

@JAF84
Copy link
Author

JAF84 commented Mar 15, 2024

btw: if you think this is the problem:
"Unknown or unsupported encryption version. R: 3"

this should be fixable easily, if revision 4 is already working?
see pdf pdfreference1
https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.7old.pdf
page 125+126.

there are some jobs aditional jobs do to
if revision is 3+ or revision is 4+...

br johannes

@ragusaa
Copy link
Contributor

ragusaa commented Mar 15, 2024

Unfortunately, we have a few other high-priority tasks that we need to address before we can get started on this. There is some other PDF work we need to do, so we plan on fixing this as part of that work.

I'll definitely let you know when there is something to test on your other samples.

Andy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants