-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: Reset CT entry for CT_REOPENED #32614
base: main
Are you sure you want to change the base?
bpf: Reset CT entry for CT_REOPENED #32614
Conversation
cee504a
to
81a8e00
Compare
/test |
81a8e00
to
c901696
Compare
Simplified bpf changes a bit. |
/test |
c901696
to
130f2ea
Compare
bpf lint fixes. |
/test |
Nice catch! For clarity - this change would only help TCP connections, correct? I think we need to find a solution for UDP / SCTP as well (or But long-term I think the CT lookups from |
TCP only, due to this snippet in
Our Envoy proxy redirects are TCP only, so to fix the Envoy issue fixing this for TCP suffices for now. This same issue could come up with DNS proxy though, so it would be good to solve this for UDP (and SCTP) as well.
I'll split off the |
Reset the CT entry when an old entry has been found when looking up an entry for TCP SYN packet (CT_REOPENED). This way the old entry has no effect to the entry for the new connection and security identity, as well as proxy and nodeport related flags get set correctly for the new connection. This prevents failing proxy redirect when the CT entry of a non-redirected connection would have been reused for a redirected connection, which has happened in the CI where an egress L7 policy was applied and an old CT entry from time without the L7 policy existed. Fixes: cilium#27762 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Reset the CT entry when an old entry has been found when looking up an entry for TCP SYN packet (CT_REOPENED). This way the old entry has no effect to the entry for the new connection and security identity, as well as proxy and nodeport related flags get set correctly for the new connection. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
130f2ea
to
d813c38
Compare
I considered this approach, but concluded that it would easily result into more brittle code, and this simpler approach is better, for the following reasons:
Given this it might be better to get rid of Current As for changes for CT_ESTABLISHED: changes in proxy redirection mid-connection will not work, but currently policy changes break existing connections for which redirection status changes, due to forward direction immediately behaving accoring to the new policy, while return direction takes the redirection status from the CT entry, that is not currently updated for
Looks like |
/test |
# ifdef ENABLE_DSR | ||
/* Clear .dsr_internal flag for old connections: */ | ||
if (ret == CT_REOPENED && ct_state->dsr_internal) | ||
ct_update_dsr(get_ct_map4(tuple), tuple, false); | ||
# endif /* ENABLE_DSR */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's address this part separately (#32642), there's more to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was conditionally setting the flag to false, removed this as now the flag is false by default. IMO removal of this is the right thing to do in this PR (no functional change for the parts that were already updated for CT_REOPENED
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty! I took a closer look at the nodeport.h changes.
/* Maybe we can be a bit more selective about CT_REOPENED? | ||
* But we have to assume that both the CT and the SNAT entry are stale. | ||
*/ | ||
case CT_REOPENED: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the comment without any explanation why it no longer applies, doesn't feel very helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR changes to reset the CT entry for CT_REOPENED
, so there is no longer any concern for being more selective?
ct_state_new.dsr_internal = 1; | ||
ct_state_new.proxy_redirect = 0; | ||
ct_state_new.from_l7lb = 0; | ||
ct_state_new.dsr_internal = true; | ||
/* all other flags are already false */ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a cleanup, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, stemming from the fact that other sites were missing these explicit = 0
statements.
case CT_REOPENED: | ||
memset(&ct_state, 0, sizeof(ct_state)); | ||
ct_state.rev_nat_index = ct_state_svc.rev_nat_index; | ||
fallthrough; | ||
case CT_NEW: | ||
ct_state.src_sec_id = src_sec_identity; | ||
ct_state.node_port = 1; | ||
ct_state.node_port = true; | ||
#ifndef HAVE_FIB_IFINDEX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still believe that teaching ct_entry_matches_types()
about src_sec_id
and ifindex
for the case of CT_ENTRY_NODEPORT
would get us much further here. And address any concerns for CT_ESTABLISHED
.
We could literally just copy the code that fills the ct_state
in front of the CT lookup, and say "this is what I'm expecting".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you mean the ct_*
function would insert the new entry then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't do policy lookup before CT lookup, as we can do policy lookup on a forward direction only, and a CT lookup is needed to know if a packet is in forward or reply direction. Also, some of the CT entry contents come from the policy lookup, so it does not seem feasible to be able to give a full expected CT entry before the CT lookup.
If we follow your change suggestion here, that would be my conclusion as well. We're already clearing quite a few parts of the entry, so why not go all the way. But I don't have sufficient historical context on why we've introduced
Ack. So we need to decide whether this PR is only about fixing the L7 proxy aspect, or wants to address fuzzy CT lookups as a whole. |
I would like this PR to be a backportable bug fix for the l7 proxy redirect issue/flake. In that light maybe I should remove the cleanups as well to make this truly minimal? THB, I don't know if the nodeport.h changes are absolutely necessary for the l7 proxy redirect fix, though. |
Here's the commit that added CT_REOPENED` (#13340):
It looks to me |
Reset the CT entry when an old entry has been found when looking up an entry for TCP SYN packet (CT_REOPENED). This way the old entry has no effect to the entry for the new connection and security identity, as well as proxy and nodeport related flags get set correctly for the new connection.
This prevents failing proxy redirect when the CT entry of a non-redirected connection would have been reused for a redirected connection, which has happened in the CI where an egress L7 policy was applied and an old CT entry from time without the L7 policy existed.
For this bug to trigger a pod needs to open a new TCP connection using the same (ephemeral) source port to the same destination before and after a change in policy adding an L7 policy applicable to that connection. While this is a bug that has surfaced in CI, this bug is less likely to be hit when policies are more stable. Given this we need further discussion about backporting this bug fix to the stable releases.
Fixes: #27762 (see #27762 (comment))