-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threshold cache/v11 #11077
Threshold cache/v11 #11077
Conversation
Thresholding often has 2 stages: 1. recording matches 2. appling an action, like suppress E.g. with something like: threshold:type limit, count 10, seconds 3600, track by_src; the recording state is about counting 10 first hits for an IP, then followed by the "suppress" state that might last an hour. By_src/by_dst are expensive, as they do a host table lookup and lock the host. If many threads require this access, lock contention becomes a serious problem. This patch adds a thread local cache to avoid the synchronization overhead. When the threshold for a host enters the "apply" stage, a thread local hash entry is added. This entry knows the expiry time and the action to apply. This way the action can be applied w/o the synchronization overhead. A rbtree is used to handle expiration.
Packet pointer is not used during allocation.
Add a callback and helper function to handle data expiration. Update datasets to explicitly not use expiration.
Instead of a Host and IPPair table thresholding layer, use a dedicated THash to store both. This allows hashing on host+sid+tracker or ippair+sid+tracker, to create more unique hash keys. This allows for fewer hash collisions. The per rule tracking also uses this, so that the single big lock is no longer a single point of contention. Ticket: OISF#426.
Use the same hash key as for the regular threshold storage, so include gid, rev, tentant id.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #11077 +/- ##
==========================================
+ Coverage 83.63% 83.64% +0.01%
==========================================
Files 922 922
Lines 250375 250403 +28
==========================================
+ Hits 209399 209455 +56
+ Misses 40976 40948 -28
Flags with carried forward coverage won't be shown. Click here to find out more. |
Information: ERROR: QA failed on SURI_TLPR1_suri_time.
Pipeline 20648 |
} | ||
|
||
SCLogDebug("timeout: ending: %u entries expired", cnt); | ||
return cnt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Victor, I just realized we're never updating cnt
in this fn. That looks off to me.
Apparently, it wasn't updated earlier either f05b4c8
There are no users of cnt
either which is why I think it has worked out ok.
Q: Do we really need to return anything from this fn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing.
Currently the returned value isn't used indeed, but it should still be fixed.
if (cnt++ > 2) | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will run the above code 4 times now..
Is that the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not seeing how you'd get to 4. Can see 3 though, fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this: http://tpcg.io/_ALOLST
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks.
Replaced by #11171 |
Reimplement threshold storage and add a thread local cache, both to reduce lock contention.
https://redmine.openinfosecfoundation.org/issues/426
https://redmine.openinfosecfoundation.org/issues/6967
Replaces #11026:
Review in isolation, no relation other thresholding PRs.