Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.15] fix: correctly set scope_switch for service lookup #32608

Merged
merged 1 commit into from
May 22, 2024

Conversation

sypakine
Copy link

When looking up a loadbalancer service in lxc from-container, scope_switch value for flows matching an existing conntrack entry may skip the correct return path. Set scope switch value to be consistent with earlier service lookup.

Note: this change only applies to v1.15 and below, as v1.16 has been refactored to prevent the lb4_lookup_service call.

This can lead to packet drops due to failed service lookup when a stale CT entry maps to a backend ID that is no longer present in the service map.

Fixes: #32472

Fix: LB service lookup for flow matching conntrack entry 

@sypakine sypakine requested a review from a team as a code owner May 17, 2024 14:48
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels May 17, 2024
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label May 17, 2024
@julianwiedmann julianwiedmann self-requested a review May 21, 2024 05:10
@julianwiedmann julianwiedmann added kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/loadbalancing Impacts load-balancing and Kubernetes service implementations and removed kind/backports This PR provides functionality previously merged into master. labels May 21, 2024
Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! One change request inline, to make this work for both in-cluster and cluster-external usage.

Some context, as the patch description & release note aren't very clear on the affected scenario / wider impact. For eTP=local services, we add two entries to the service map - one with only the local backends, and a second entry with all backends. In-cluster access is meant to use the wider-scope entry.

But due to this bug, the in-cluster access to eTP=local services doesn't fall through to the "all backends" entry. As result, a client on a node with no local backend would be unable to fall back, in case its initial backend terminates.

bpf/lib/lb.h Outdated Show resolved Hide resolved
For eTP=local services, we add two entries to the service map - one with only the local backends, and a second entry with all backends. In-cluster access is meant to use the wider-scope entry.

Due to incorrect scope switch, in-cluster access to eTP=local services doesn't fall through to the "all backends" entry. As result, a client on a node with no local backend would be unable to fall back, in case its initial backend terminates.

Note: this change only applies to v1.15 and below, as v1.16 has been refactored to prevent the lb4_lookup_service call.

Fixes: cilium#32472

Signed-off-by: Mark St John <markstjohn@google.com>
@julianwiedmann
Copy link
Member

/test-backport-1.15

@julianwiedmann julianwiedmann self-requested a review May 21, 2024 15:58
Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you! Let's see what CI says ...

@julianwiedmann julianwiedmann added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 21, 2024
@sypakine sypakine changed the title fix: correctly set scope_switch for lb4 service lookup fix: correctly set scope_switch for service lookup May 21, 2024
@julianwiedmann julianwiedmann changed the title fix: correctly set scope_switch for service lookup [v1.15] fix: correctly set scope_switch for service lookup May 22, 2024
@lmb lmb merged commit c53f4ff into cilium:v1.15 May 22, 2024
59 checks passed
@sypakine
Copy link
Author

@julianwiedmann I'm not sure this is considered a major bug for the backport criteria -- thoughts?

@julianwiedmann
Copy link
Member

@julianwiedmann I'm not sure this is considered a major bug for the backport criteria -- thoughts?

I'm seeing it as more of a corner-case, that wouldn't justify backports to older branches. And I'd expect v1.15 to be quite mature at this point, so affected users could easily upgrade and pick up the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/loadbalancing Impacts load-balancing and Kubernetes service implementations backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. kind/bug This is a bug in the Cilium logic. kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants