Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check context every 128 labels instead of 100 #14118

Merged
merged 1 commit into from
May 21, 2024

Conversation

colega
Copy link
Contributor

@colega colega commented May 17, 2024

Follow up on #14096

As promised, I bring a benchmark, which shows a very small improvement if context is checked every 128 iterations of label instead of every 100.

It's much easier for a computer to check modulo 128 than modulo 100. This is a very small 0-2% improvement but I'd say this is one of the hottest paths of the app so this is still relevant.

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/tsdb/index
                                                                   │    main     │                128                 │
                                                                   │   sec/op    │   sec/op     vs base               │
MemPostings_PostingsForLabelMatching/labels=1000/matcher=fast-12     38.36µ ± 1%   38.01µ ± 1%  -0.91% (p=0.003 n=10)
MemPostings_PostingsForLabelMatching/labels=1000/matcher=slow-12     12.65µ ± 1%   12.40µ ± 1%  -2.01% (p=0.000 n=10)
MemPostings_PostingsForLabelMatching/labels=10000/matcher=fast-12    402.8µ ± 2%   395.1µ ± 1%  -1.91% (p=0.000 n=10)
MemPostings_PostingsForLabelMatching/labels=10000/matcher=slow-12    250.1m ± 3%   249.9m ± 0%       ~ (p=0.089 n=10)
MemPostings_PostingsForLabelMatching/labels=100000/matcher=fast-12   3.952m ± 1%   3.930m ± 0%  -0.57% (p=0.000 n=10)
MemPostings_PostingsForLabelMatching/labels=100000/matcher=slow-12    3.283 ± 0%    3.292 ± 1%       ~ (p=0.190 n=10)
geomean                                                              2.931m        2.906m       -0.87%

Follow up on prometheus#14096

As promised, I bring a benchmark, which shows a very small improvement
if context is checked every 128 iterations of label instead of every
100.

It's much easier for a computer to check modulo 128 than modulo 100.
This is a very small 0-2% improvement but I'd say this is one of the
hottest paths of the app so this is still relevant.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Oleg.

@colega
Copy link
Contributor Author

colega commented May 21, 2024

@jesusvazquez nerd-sniped me, so I'll do the same with the rest, here's the difference in assembly between doing modulo 128 and modulo 100:
https://godbolt.org/z/4ETM4W8b7

        TEXT    main.modulo128eq0(SB), NOSPLIT|NOFRAME|ABIInternal, $0-8
        FUNCDATA        $0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
        FUNCDATA        $1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
        FUNCDATA        $5, main.modulo128eq0.arginfo1(SB)
        FUNCDATA        $6, main.modulo128eq0.argliveinfo(SB)
        PCDATA  $3, $1
        TESTQ   $127, AX
        SETEQ   AL
        RET
        TEXT    main.modulo100eq0(SB), NOSPLIT|NOFRAME|ABIInternal, $0-8
        FUNCDATA        $0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
        FUNCDATA        $1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
        FUNCDATA        $5, main.modulo100eq0.arginfo1(SB)
        FUNCDATA        $6, main.modulo100eq0.argliveinfo(SB)
        PCDATA  $3, $1
        MOVQ    $-8116567392432202711, CX
        IMULQ   AX, CX
        MOVQ    $368934881474191032, DX
        ADDQ    DX, CX
        ROLQ    $62, CX
        MOVQ    $184467440737095516, DX
        CMPQ    DX, CX
        SETCC   AL
        RET

Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM learned something new today with this

a % 128 = 0 is the same as a & 127 = 0

Which is a faster operation since it just requires doing the AND and not the whole divison and remainer comparison https://godbolt.org/z/4ETM4W8b7

@jesusvazquez jesusvazquez merged commit fe9cb5a into prometheus:main May 21, 2024
25 checks passed
@cristaloleg
Copy link

Small clarifications: there is no need to explicitly rewrite a % 128 as a & 127 'cause Go compiler does this automatically, see src/cmd/compile/internal/ssa/_gen/generic.rules (look for mod by power of 2 constant.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants