-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
traefik_open_connections metric drifts down until negative #10733
Comments
Digging a bit more… I don’t really speak Go but I’ll try… Clearly this function RemoveConnection is being called too much, or AddConnection not enough. |
Errr, so something looks off to me, but probably is not. Reproducing the code of RemoveConnection here: func (c *connectionTracker) RemoveConnection(conn net.Conn) {
c.connsMu.Lock()
delete(c.conns, conn)
c.connsMu.Unlock()
if c.openConnectionsGauge != nil {
c.openConnectionsGauge.Add(-1)
}
} Is it normal that the Gauge decrementation is outside of the mutex section ? I guess the instrumentation library is threadsafe already, is it ? |
@rtribotte I see that you authored this highly relevant commit 7c2af10 from PR #9656, might you be able to chip in ? |
I have an issue too my metrics: |
Hello @navaati, thanks for opening this!
It should be threadsafe indeed (https://github.com/prometheus/client_golang/blob/release-1.17/prometheus/gauge.go#L122C1-L130C2), Have you had a chance to try the alternative (protect with the mutex)? Did it fix the bug? |
Hi. No, I haven’t tested anything yet: I first need to figure out how to build the binary then the container, for which I haven’t found instructions yet (at least nothing in the contributing documentation). EDIT: found the doc https://doc.traefik.io/traefik/contributing/building-testing/ ! I’ll find and try to test, although with that link you dug I don’t have much hope. Is it actually the official prom client which is being used here though ? I see the type of the gauge is |
Yeah no, it’s just an abstraction layer over the official prom client: https://github.com/go-kit/kit/blob/dfe43fa6a8d72c23e2205d0b80e762346e203f78/metrics/prometheus/prometheus.go#L84. |
Welcome!
What did you do?
I configured Prometheus monitoring on a traefik instance with
--metrics.prometheus=true
on a separate entrypoint (for a separate port) with no other Prometheus specific configuration, so I got the default set of metrics. I can query the /metrics allright and I would expect traefik_open_connections to stay at 5 (as that’s my load, 5 clients from a load testing tool).What did you see instead?
The value of the traefik_open_connections metric goes down over time, until it even gets negative, which doesn’t make sense for a connection count (how can you have -1 open connections…).
See this excerpt from the /metrics:
and this screenshot from Prometheus:
In the screenshot, the metric is supposed to stay at a constant 5 connections.
As a proof that there is actually still 5 connections, here is the output of
ss
(where we also see the connections to the backend service):What version of Traefik are you using?
What is your environment & configuration?
The labels on the backend container discovered by the docker provider:
The load on the service is a constant 5 clients repeatedly requesting the service, generated by locust (a python load testing tool). The backend is a python app that can handle one client at once and takes around 110ms for one request, so with a load of 5 clients it handles around 9 req/s with a latency of 550ms.
If applicable, please paste the log output in DEBUG level
No response
The text was updated successfully, but these errors were encountered: