Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L4 RBS External LB NEG support #2565

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mmamczur
Copy link
Contributor

This PR contains the implementation of NEG variant of the L4 RBS LoadBalancer

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 23, 2024
@mmamczur mmamczur force-pushed the ext_l4_lb_negs branch 2 times, most recently from e6b84dd to 2117076 Compare May 23, 2024 13:49
@mmamczur
Copy link
Contributor Author

/assign @cezarygerard

@mmamczur
Copy link
Contributor Author

/assign @swetharepakula
/assign @sawsa307

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 28, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 29, 2024
@mmamczur
Copy link
Contributor Author

/unassign @swetharepakula

@mmamczur
Copy link
Contributor Author

/unassign @sawsa307

@mmamczur
Copy link
Contributor Author

/hold

let's not merge this before the next code freeze ends

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 29, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 19, 2024
@mmamczur mmamczur force-pushed the ext_l4_lb_negs branch 2 times, most recently from f2e439f to cb279cf Compare August 13, 2024 13:56
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 6, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 6, 2024
@@ -191,15 +191,15 @@ func NewTransactionSyncer(
return syncer
}

func GetEndpointsCalculator(podLister, nodeLister, serviceLister cache.Indexer, zoneGetter *zonegetter.ZoneGetter, syncerKey negtypes.NegSyncerKey, mode negtypes.EndpointsCalculatorMode, logger klog.Logger, enableDualStackNEG bool, syncMetricsCollector *metricscollector.SyncerMetrics, networkInfo *network.NetworkInfo) negtypes.NetworkEndpointsCalculator {
func GetEndpointsCalculator(podLister, nodeLister, serviceLister cache.Indexer, zoneGetter *zonegetter.ZoneGetter, syncerKey negtypes.NegSyncerKey, mode negtypes.EndpointsCalculatorMode, logger klog.Logger, enableDualStackNEG bool, syncMetricsCollector *metricscollector.SyncerMetrics, networkInfo *network.NetworkInfo, l4LBType negtypes.L4LBType) negtypes.NetworkEndpointsCalculator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this arg list begs for refactor in the future

Comment on lines +37 to +40
// Max number of subsets for NetLB in ExternalTrafficPolicy:Local
maxSubsetSizeNetLBLocal = 1000
// Max number of subsets for NetLB in ExternalTrafficPolicy:Cluster
maxSubsetSizeNetLBCluster = 250
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what subsets sizes were used for netlb multinetworking?

does this change sizes used for netlb multinetworking now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multinet uses the same sizes as ILB.
I see no harm in changing this, actually this whole effort will improve multinet

// It exposes methods to calculate Network endpoints for GCE_VM_IP NEGs when the service
// uses "ExternalTrafficPolicy: Local" mode.
// In this mode, the endpoints of the NEG are calculated by listing the nodes that host the service endpoints(pods)
// for the given service. These candidate nodes picked as is, if the count is less than the subset size limit(250).
// Otherwise, a subset of nodes is selected.
// In a cluster with nodes node1... node 50. If nodes node10 to node 45 run the pods for a given ILB service, all these
// nodes - node10, node 11 ... node45 will be part of the subset.
type LocalL4ILBEndpointsCalculator struct {
type LocalL4EndpointsCalculator struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct name is changing now, but we could have done it in multintworking change, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

return &loadbalancers.L4NetLBSyncResult{Error: fmt.Errorf("Failed to attach L4 External LoadBalancer finalizer to service %s/%s, err %w", service.Namespace, service.Name, err)}
usesNegBackends := false

if lc.enableNEGSupport || utils.HasL4NetLBFinalizerV3(service) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we enabled neg support, but there is v2 finalizer?

we should not migrate automatically

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is actually wrong, thanks for pointing it out. It's a leftover of changing the logic of how we introduce the NEG variant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I changed this

@cezarygerard
Copy link
Contributor

I have 1 major comment

you should take V2 finalizer into account and not migrate to V3 event if negs are supported.

if err := common.EnsureServiceFinalizer(service, common.NetLBFinalizerV2, lc.ctx.KubeClient, svcLogger); err != nil {
return &loadbalancers.L4NetLBSyncResult{Error: fmt.Errorf("Failed to attach L4 External LoadBalancer finalizer to service %s/%s, err %w", service.Namespace, service.Name, err)}
usesNegBackends := false

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: please add comment above this if explaining why we usesNegBackends

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not orphan sevices with NetLBFinalizerV3
in casewe had to rollback the feature

if utils.HasL4NetLBFinalizerV3(service) || (lc.enableNEGSupport && !utils.HasL4NetLBFinalizerV2(service) )

please don't hesitate to use brackets, even if they are redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted this to a separate function which should be easier to follow, it was becoming a mess.

@@ -560,6 +560,101 @@ func TestProcessMultinetServiceCreate(t *testing.T) {
deleteNetLBService(lc, svc)
}

func TestProcessNEGServiceCreate(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to add quick test on scenario with service using V3 finalizer but lc.enableNEGSupport = false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few tests including one for what you describe.
Others are mostly testing 'rollback like' scenarios

@cezarygerard
Copy link
Contributor

good job Michał!

lgtm overall

please read my comments

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 11, 2024
pkg/l4lb/l4netlbcontroller.go Outdated Show resolved Hide resolved
if usesNegBackends {
ensureFinalizer = common.NetLBFinalizerV3
}
if err := common.EnsureServiceFinalizer(service, ensureFinalizer, lc.ctx.KubeClient, svcLogger); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will change finalizer on existing multinet services, did you consider it?
OTOH probably there is no way to avoid it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it won't, multinet is treated differently for now - usesNegBackends won't be true for them, there is a separate condition below. But it would be good to move them to V3 at some point and unify that logic.

pkg/l4lb/l4netlbcontroller_test.go Outdated Show resolved Hide resolved
linkType = negLink
}

if err = lc.ensureBackendLinking(service, linkType, svcLogger); err != nil {
lc.ctx.Recorder(service.Namespace).Eventf(service, v1.EventTypeWarning, "SyncExternalLoadBalancerFailed",
"Error linking instance groups to backend service, err: %v", err)
"Error linking backends to backend service, err: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if Neg is not provisioned yet?

syncInternal should fail and retry immediately, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the same mechanism as for ILB

@@ -123,3 +125,26 @@ func EnsureDeleteServiceFinalizer(service *corev1.Service, key string, kubeClien
svcLogger.V(2).Info("Removing finalizer from service", "finalizerKey", key)
return patch.PatchServiceObjectMetadata(kubeClient.CoreV1(), service, *updatedObjectMeta)
}

// EnsureServiceDeleteFinalizers patches the service to ensure the specified finalizers are not present in the service finalizers list.
// This function is needed if more than one finalikzer has to be removed since you can't invoke the 1 param version multiple times.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// This function is needed if more than one finalizer has to be removed since it is very inefficient to invoke 1-param version multiple times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified, discussed offline

@cezarygerard
Copy link
Contributor

I only have few readability comments

the logic looks good to me

@cezarygerard
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 12, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cezarygerard, mmamczur

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cezarygerard,mmamczur]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 12, 2024
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 18, 2024
Adds 2 new flags related to NetLB RBS NEG support.
`--enable-l4-netlb-neg` that will allow to create NEGs for L4 NetLBs.
`--enable-l4-netlb-neg-default` which will make all new RBS services use NEG backends.
NetLBFinalizerV3 will be added to the L4 RBS NetLBs that opt-in for NEG support.
When enabled, the NEG controller will create NEGs for L4 RBS NetLBs that are marked by the L4 NetLB controller to be NEG based.
When the feature is enabled the controller can create NEG backed LBs. When the 'NEG default flag' is enabled, all new RBS NetLB services will be created with NEG backends.
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants