-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Comparing changes
Open a pull request
base repository: git/git
base: dc97afdcb93ca683bf73b4ae2ff028c161206617
head repository: git/git
compare: fcacc2b161b095c99dfd4e0b05dcc1ed8ca80a62
- 8 commits
- 17 files changed
- 2 contributors
Commits on Mar 5, 2024
-
Merge branch 'ps/reftable-iteration-perf-part2' into ps/reftable-refl…
…og-iteration-perf * ps/reftable-iteration-perf-part2: refs/reftable: precompute prefix length reftable: allow inlining of a few functions reftable/record: decode keys in place reftable/record: reuse refname when copying reftable/record: reuse refname when decoding reftable/merged: avoid duplicate pqueue emptiness check reftable/merged: circumvent pqueue with single subiter reftable/merged: handle subiter cleanup on close only reftable/merged: remove unnecessary null check for subiters reftable/merged: make subiters own their records reftable/merged: advance subiter on subsequent iteration reftable/merged: make `merged_iter` structure private reftable/pq: use `size_t` to track iterator index
Configuration menu - View commit details
-
Copy full SHA for 2efe795 - Browse repository at this point
Copy the full SHA 2efe795View commit details -
refs/reftable: reload correct stack when creating reflog iter
When creating a new reflog iterator, we first have to reload the stack that the iterator is being created. This is done so that any concurrent writes to the stack are reflected. But `reflog_iterator_for_stack()` always reloads the main stack, which is wrong. Fix this and reload the correct stack. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Configuration menu - View commit details
-
Copy full SHA for eea0d11 - Browse repository at this point
Copy the full SHA eea0d11View commit details -
reftable/record: convert old and new object IDs to arrays
In 7af607c (reftable/record: store "val1" hashes as static arrays, 2024-01-03) and b31e3cc (reftable/record: store "val2" hashes as static arrays, 2024-01-03) we have converted ref records to store their object IDs in a static array. Convert log records to do the same so that their old and new object IDs are arrays, too. This change results in two allocations less per log record that we're iterating over. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 8,068,495 allocs, 8,068,373 frees, 401,011,862 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Configuration menu - View commit details
-
Copy full SHA for 87ff723 - Browse repository at this point
Copy the full SHA 87ff723View commit details -
reftable/record: avoid copying author info
Each reflog entry contains information regarding the authorship of who has made the change. This authorship information is not the same as that of any of the commits that the reflog entry references, but instead corresponds to the local user that has executed the command. Thus, it is almost always the case that all reflog entries have the same author. We can make use of this fact when decoding reftable records: instead of freeing and then reallocating the authorship information of log records, we can special-case when the next record during an iteration has the exact same authorship as the preceding record. If so, then there is no need to reallocate the respective fields. This change results in two allocations less per log record that we're iterating over in the most common case. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 4,068,487 allocs, 4,068,365 frees, 332,011,793 bytes allocated An alternative would be to store the capacity of both name and email and then use `REFTABLE_ALLOC_GROW()` to conditionally reallocate the array. But reftable records are copied around quite a lot, and thus we need to be a bit mindful of the overall record size. Furthermore, a memory comparison should also be more efficient than having to copy over memory even if we wouldn't have to allocate a new array every time. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Configuration menu - View commit details
-
Copy full SHA for 01639ec - Browse repository at this point
Copy the full SHA 01639ecView commit details -
reftable/record: reuse refnames when decoding log records
When decoding a log record we always reallocate their refname arrays. This results in quite a lot of needless allocation churn. Refactor the code to grow the array as required only. Like this, we should usually only end up reallocating the array a small handful of times when iterating over many refs. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 4,068,487 allocs, 4,068,365 frees, 332,011,793 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 3,068,488 allocs, 3,068,366 frees, 307,122,961 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Configuration menu - View commit details
-
Copy full SHA for 193fcb3 - Browse repository at this point
Copy the full SHA 193fcb3View commit details -
reftable/record: reuse message when decoding log records
Same as the preceding commit we can allocate log messages as needed when decoding log records, thus further reducing the number of allocations. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 3,068,488 allocs, 3,068,366 frees, 307,122,961 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 2,068,487 allocs, 2,068,365 frees, 305,122,946 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Configuration menu - View commit details
-
Copy full SHA for e0bd13b - Browse repository at this point
Copy the full SHA e0bd13bView commit details -
reftable/record: use scratch buffer when decoding records
When decoding log records we need a temporary buffer to decode the reflog entry's name, mail address and message. As this buffer is local to the function we thus have to reallocate it for every single log record which we're about to decode, which is inefficient. Refactor the code such that callers need to pass in a scratch buffer, which allows us to reuse it for multiple decodes. This reduces the number of allocations when iterating through reflogs. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 2,068,487 allocs, 2,068,365 frees, 305,122,946 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 1,068,485 allocs, 1,068,363 frees, 281,122,886 bytes allocated Note that this commit also drop some redundant calls to `strbuf_reset()` right before calling `decode_string()`. The latter already knows to reset the buffer, so there is no need for these. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Configuration menu - View commit details
-
Copy full SHA for 7b8abc4 - Browse repository at this point
Copy the full SHA 7b8abc4View commit details -
refs/reftable: track last log record name via strbuf
The reflog iterator enumerates all reflogs known to a ref backend. In the "reftable" backend there is no way to list all existing reflogs directly. Instead, we have to iterate through all reflog entries and discard all those redundant entries for which we have already returned a reflog entry. This logic is implemented by tracking the last reflog name that we have emitted to the iterator's user. If the next log record has the same name we simply skip it until we find another record with a different refname. This last reflog name is stored in a simple C string, which requires us to free and reallocate it whenever we need to update the reflog name. Convert it to use a `struct strbuf` instead, which reduces the number of allocations. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 1,068,485 allocs, 1,068,363 frees, 281,122,886 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 68,485 allocs, 68,363 frees, 256,234,072 bytes allocated Note that even after this change we still allocate quite a lot of data, even though the number of allocations does not scale with the number of log records anymore. This remainder comes mostly from decompressing the log blocks, where we decompress each block into newly allocated memory. This will be addressed at a later point in time. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Configuration menu - View commit details
-
Copy full SHA for fcacc2b - Browse repository at this point
Copy the full SHA fcacc2bView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff dc97afdcb93ca683bf73b4ae2ff028c161206617...fcacc2b161b095c99dfd4e0b05dcc1ed8ca80a62