-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruby/Python/JS/Swift: Add category of Private information to shared sensitive data heuristics #16446
Ruby/Python/JS/Swift: Add category of Private information to shared sensitive data heuristics #16446
Conversation
b3d9a6f
to
23fbfce
Compare
No change note is needed for Swift, as the new heuristics are unused and thus should not affect any queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the author of the private information regexps for Swift (class SensitivePrivateInfo
), I'm happy to see these expressions being used more widely. They've already had a reasonable amount of testing and iteration in Swift, and I expect they will work pretty well for other languages with minimal or no additional tuning.
@@ -49,6 +49,7 @@ class SensitiveCredential extends SensitiveDataType, TCredential { | |||
exists(SensitiveDataClassification classification | | |||
not classification = SensitiveDataClassification::password() and // covered by `SensitivePassword` | |||
not classification = SensitiveDataClassification::id() and // not accurate enough | |||
not classification = SensitiveDataClassification::private() and // covered by `SensitivePrivateInfo` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you've excluded the new results from Swift for now. Since they're a near duplicate of Swift's existing class SensitivePrivateInfo
I think we should aim (now or later) to define that in terms of the new result classification instead, similarly to how class SensitivePassword
already works. This may add a few new results (for SSN and CCN) in the four or five Swift sensitive data queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I don't mind doing this as follow-up actually. I already made half the changes I'll need to run a test of the SSN and CCN terms!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR for Swift: #16570
print(debit_card) # NOT OK | ||
print(bank_number) # NOT OK | ||
print(bank_account) # NOT OK, but NOT FOUND - "account" is treated as having the "id" classification and thus excluded. | ||
print(accountNo) # NOT OK, but NOT FOUND - "account" is treated as having the "id" classification and thus excluded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These NOT FOUND
results are good observations. We can think about mechanisms for limiting the exclusions perhaps and finding more results in cases like these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that be considered for this PR or as potential follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As potential follow-up I'd say. There will always be more improvements we can make to heuristics like this.
@geoffw0 What are the next steps for this? Should I just be waiting for an approval from each language team? |
@joefarebrother: Could you summarise what changed results your evaluations showed. |
@erik-krogh There are new results for New results in ruby sensitive get looks like a few TPs from new heuristics. (The MRVA run that restricts to only new heuristics shows some apparent FPs, but they are also present in the run of the full version of the query prior to changes; and appear to be the result of some nodes being spuriously detected as calls to several different methods including those matching sensitive heuristics) Python cleartext storage and weak sensitive hashing have a few new results for which the sources look like TPs from new heuristics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JS 👍
But we should probably do a followup where this implementation is moved to the shared folder (together with CryptoAlgorithmNames
?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geoffw0 What are the next steps for this? Should I just be waiting for an approval from each language team?
Yes I think so.
Ruby 👍 - the new results seem pretty reasonable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
I see this as a clear improvement of what we already have, but in the context of py/weak-sensitive-data-hashing
, I'll note that credit card numbers have such a limited input space that they need computationally expensive hash functions (like passwords do) -- so that's a future improvement we should make to that query (would probably require credit card information getting its' own category instead of being part of the new private category)
Adds a category of private information to the shared sensitive data heuristics file.
This may result in new results for the following queries:
rb/sensitive-get-query
py/clear-text-storage-sensitive-data
py/clear-text-logging-sensitive-data
py/weak-sensitive-data-hashing
js/clear-text-stotage-sensitive-data
js/clear-text-logging