-
Notifications
You must be signed in to change notification settings - Fork 147
Repeated make helm-delete-vpn helm-install-vpn
makes impossible to install any NSC
#2255
Comments
Thanks for filing this bug. Attaching some logs. The attached logs has failures around the time when the bug was reproduced. I am also attaching a minor nsm patch to run with ubuntu-20.04 box and also increasing ram size for the vagrant vms. Has nothing to do with the failure per-se. The failure logs from forwarder on node 4 and node 1 (suffix 4 and 1). And nsm-manager container logs from respective nodes 1 and 4. Node 4 being the nsc and node 1 hosting the gateway nse pod. |
@karthick18 Question... is there a reason you are using NSM v0.2 vs NSM v1.0? |
Not really. Just wanted to use the latest. Having said that, I think I also did one against v1.0 and saw the same thing. Not completely sure but can try it again. |
I got sidetracked from this issue as it wasn't really a blocker. However I got to move to nsm v0.1.0 tag and also reproduced it. The patch addresses 3 things. 2 panics in nsmd. And a lockup in vpp server which seems to be a result of a configurator race from dataplane from Request and Close. As the lockup in vpp results in vpp connect failures and dataplane failing with ErrConnectionFailed to network service manager. vppctl also hangs on unix socket on a connect when this happens. However need to confirm if the next 2 patches in nsmd as mentioned below would prevent the vpp lockup (highly unlikely) First one is simple and based on a missing networkservice endpoint resulting in traversal of null endpoints This was from Upate cross connects. time="2021-08-18T20:48:24Z" level=info msg="Connection with Remote Network Service kube-worker1 at 10.44.0.3:5001 is established" **panic: interface conversion: nsm.NSMConnection is connection.Connection, not connection.Connection (types from different packages) goroutine 1427 [running]: The other straight-forward panic also addressed in v0.2.0 branch is a result of a panic while traversing null endpoints returned from find networkservice endpoints in 1.0 while restoring connections. 2021/08/18 19:56:08 Reporting span 1532223ecb7f9611:1532223ecb7f9611:0:1 goroutine 1256 [running]: #!/usr/bin/env bash The fix on v0.1.0 for the issues to make the test successful is:
@@ -775,6 +776,7 @@ func (srv *networkServiceManager) RestoreConnections(xcons []*crossconnect.Cross
|
Also attaching the patch on v1.0 branch which is inlined in above comment as a file. |
Expected Behavior
Repeated
make helm-delete-vpn helm-install-vpn
should delete previous vpn-case NSC, NSE and install new ones.Current Behavior
After some time of repeating
make helm-delete-vpn helm-install-vpn
starts failing to install vpn-case NSC and so any other NSC.Failure Information (for bugs)
Steps to Reproduce
make helm-delete-vpn helm-install-vpn
.Important:
make helm-install-vpn
should be executed while last instance of vpn (nsc/nse) is deleted/terminating. If wait afterhelm-delete-vpn
beforehelm-install-vpn
and do the test, it seems to be fine running it multiple times.Context
Failure Logs
The text was updated successfully, but these errors were encountered: