Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threshold cache/v11 #11077

Closed
wants to merge 15 commits into from
Closed

Conversation

victorjulien
Copy link
Member

Reimplement threshold storage and add a thread local cache, both to reduce lock contention.

https://redmine.openinfosecfoundation.org/issues/426
https://redmine.openinfosecfoundation.org/issues/6967

Replaces #11026:

  • adds config opts
  • doc for config opts
  • address review comments

Review in isolation, no relation other thresholding PRs.

Thresholding often has 2 stages:

1. recording matches
2. appling an action, like suppress

E.g. with something like:
threshold:type limit, count 10, seconds 3600, track by_src;
the recording state is about counting 10 first hits for an IP,
then followed by the "suppress" state that might last an hour.

By_src/by_dst are expensive, as they do a host table lookup and lock
the host. If many threads require this access, lock contention becomes
a serious problem.

This patch adds a thread local cache to avoid the synchronization
overhead. When the threshold for a host enters the "apply" stage,
a thread local hash entry is added. This entry knows the expiry
time and the action to apply. This way the action can be applied
w/o the synchronization overhead.

A rbtree is used to handle expiration.
Packet pointer is not used during allocation.
Add a callback and helper function to handle data expiration.

Update datasets to explicitly not use expiration.
Instead of a Host and IPPair table thresholding layer, use a dedicated
THash to store both. This allows hashing on host+sid+tracker or
ippair+sid+tracker, to create more unique hash keys.

This allows for fewer hash collisions.

The per rule tracking also uses this, so that the single big lock is no
longer a single point of contention.

Ticket: OISF#426.
Use the same hash key as for the regular threshold storage,
so include gid, rev, tentant id.
@victorjulien victorjulien requested review from jufajardini and a team as code owners May 15, 2024 09:17
Copy link

codecov bot commented May 15, 2024

Codecov Report

Attention: Patch coverage is 87.71930% with 56 lines in your changes are missing coverage. Please review.

Project coverage is 83.64%. Comparing base (806052d) to head (0ac580b).

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11077      +/-   ##
==========================================
+ Coverage   83.63%   83.64%   +0.01%     
==========================================
  Files         922      922              
  Lines      250375   250403      +28     
==========================================
+ Hits       209399   209455      +56     
+ Misses      40976    40948      -28     
Flag Coverage Δ
fuzzcorpus 64.20% <25.26%> (-0.02%) ⬇️
livemode 18.54% <12.63%> (+0.13%) ⬆️
suricata-verify 62.74% <50.00%> (-0.05%) ⬇️
unittests 62.35% <78.94%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@suricata-qa
Copy link

Information:

ERROR: QA failed on SURI_TLPR1_suri_time.

field baseline test %
SURI_TLPR1_stats_chk
.uptime 645 672 104.19%

Pipeline 20648

}

SCLogDebug("timeout: ending: %u entries expired", cnt);
return cnt;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Victor, I just realized we're never updating cnt in this fn. That looks off to me.
Apparently, it wasn't updated earlier either f05b4c8
There are no users of cnt either which is why I think it has worked out ok.
Q: Do we really need to return anything from this fn?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing.

Currently the returned value isn't used indeed, but it should still be fixed.

Comment on lines +299 to +300
if (cnt++ > 2)
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will run the above code 4 times now..
Is that the intention?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not seeing how you'd get to 4. Can see 3 though, fixing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks.

@victorjulien
Copy link
Member Author

Replaced by #11171

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants