Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: git/git
base: dc97afdcb93ca683bf73b4ae2ff028c161206617
Choose a base ref
...
head repository: git/git
compare: fcacc2b161b095c99dfd4e0b05dcc1ed8ca80a62
Choose a head ref
  • 8 commits
  • 17 files changed
  • 2 contributors

Commits on Mar 5, 2024

  1. Merge branch 'ps/reftable-iteration-perf-part2' into ps/reftable-refl…

    …og-iteration-perf
    
    * ps/reftable-iteration-perf-part2:
      refs/reftable: precompute prefix length
      reftable: allow inlining of a few functions
      reftable/record: decode keys in place
      reftable/record: reuse refname when copying
      reftable/record: reuse refname when decoding
      reftable/merged: avoid duplicate pqueue emptiness check
      reftable/merged: circumvent pqueue with single subiter
      reftable/merged: handle subiter cleanup on close only
      reftable/merged: remove unnecessary null check for subiters
      reftable/merged: make subiters own their records
      reftable/merged: advance subiter on subsequent iteration
      reftable/merged: make `merged_iter` structure private
      reftable/pq: use `size_t` to track iterator index
    gitster committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    2efe795 View commit details
    Browse the repository at this point in the history
  2. refs/reftable: reload correct stack when creating reflog iter

    When creating a new reflog iterator, we first have to reload the stack
    that the iterator is being created. This is done so that any concurrent
    writes to the stack are reflected. But `reflog_iterator_for_stack()`
    always reloads the main stack, which is wrong.
    
    Fix this and reload the correct stack.
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    eea0d11 View commit details
    Browse the repository at this point in the history
  3. reftable/record: convert old and new object IDs to arrays

    In 7af607c (reftable/record: store "val1" hashes as static arrays,
    2024-01-03) and b31e3cc (reftable/record: store "val2" hashes as
    static arrays, 2024-01-03) we have converted ref records to store their
    object IDs in a static array. Convert log records to do the same so that
    their old and new object IDs are arrays, too.
    
    This change results in two allocations less per log record that we're
    iterating over. Before:
    
        HEAP SUMMARY:
            in use at exit: 13,473 bytes in 122 blocks
          total heap usage: 8,068,495 allocs, 8,068,373 frees, 401,011,862 bytes allocated
    
    After:
    
        HEAP SUMMARY:
            in use at exit: 13,473 bytes in 122 blocks
          total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    87ff723 View commit details
    Browse the repository at this point in the history
  4. reftable/record: avoid copying author info

    Each reflog entry contains information regarding the authorship of who
    has made the change. This authorship information is not the same as that
    of any of the commits that the reflog entry references, but instead
    corresponds to the local user that has executed the command. Thus, it is
    almost always the case that all reflog entries have the same author.
    
    We can make use of this fact when decoding reftable records: instead of
    freeing and then reallocating the authorship information of log records,
    we can special-case when the next record during an iteration has the
    exact same authorship as the preceding record. If so, then there is no
    need to reallocate the respective fields.
    
    This change results in two allocations less per log record that we're
    iterating over in the most common case. Before:
    
        HEAP SUMMARY:
            in use at exit: 13,473 bytes in 122 blocks
          total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated
    
    After:
    
        HEAP SUMMARY:
            in use at exit: 13,473 bytes in 122 blocks
          total heap usage: 4,068,487 allocs, 4,068,365 frees, 332,011,793 bytes allocated
    
    An alternative would be to store the capacity of both name and email and
    then use `REFTABLE_ALLOC_GROW()` to conditionally reallocate the array.
    But reftable records are copied around quite a lot, and thus we need to
    be a bit mindful of the overall record size. Furthermore, a memory
    comparison should also be more efficient than having to copy over memory
    even if we wouldn't have to allocate a new array every time.
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    01639ec View commit details
    Browse the repository at this point in the history
  5. reftable/record: reuse refnames when decoding log records

    When decoding a log record we always reallocate their refname arrays.
    This results in quite a lot of needless allocation churn.
    
    Refactor the code to grow the array as required only. Like this, we
    should usually only end up reallocating the array a small handful of
    times when iterating over many refs. Before:
    
        HEAP SUMMARY:
            in use at exit: 13,473 bytes in 122 blocks
          total heap usage: 4,068,487 allocs, 4,068,365 frees, 332,011,793 bytes allocated
    
    After:
    
        HEAP SUMMARY:
            in use at exit: 13,473 bytes in 122 blocks
          total heap usage: 3,068,488 allocs, 3,068,366 frees, 307,122,961 bytes allocated
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    193fcb3 View commit details
    Browse the repository at this point in the history
  6. reftable/record: reuse message when decoding log records

    Same as the preceding commit we can allocate log messages as needed when
    decoding log records, thus further reducing the number of allocations.
    Before:
    
        HEAP SUMMARY:
            in use at exit: 13,473 bytes in 122 blocks
          total heap usage: 3,068,488 allocs, 3,068,366 frees, 307,122,961 bytes allocated
    
    After:
    
        HEAP SUMMARY:
            in use at exit: 13,473 bytes in 122 blocks
          total heap usage: 2,068,487 allocs, 2,068,365 frees, 305,122,946 bytes allocated
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    e0bd13b View commit details
    Browse the repository at this point in the history
  7. reftable/record: use scratch buffer when decoding records

    When decoding log records we need a temporary buffer to decode the
    reflog entry's name, mail address and message. As this buffer is local
    to the function we thus have to reallocate it for every single log
    record which we're about to decode, which is inefficient.
    
    Refactor the code such that callers need to pass in a scratch buffer,
    which allows us to reuse it for multiple decodes. This reduces the
    number of allocations when iterating through reflogs. Before:
    
        HEAP SUMMARY:
            in use at exit: 13,473 bytes in 122 blocks
          total heap usage: 2,068,487 allocs, 2,068,365 frees, 305,122,946 bytes allocated
    
    After:
    
        HEAP SUMMARY:
            in use at exit: 13,473 bytes in 122 blocks
          total heap usage: 1,068,485 allocs, 1,068,363 frees, 281,122,886 bytes allocated
    
    Note that this commit also drop some redundant calls to `strbuf_reset()`
    right before calling `decode_string()`. The latter already knows to
    reset the buffer, so there is no need for these.
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    7b8abc4 View commit details
    Browse the repository at this point in the history
  8. refs/reftable: track last log record name via strbuf

    The reflog iterator enumerates all reflogs known to a ref backend. In
    the "reftable" backend there is no way to list all existing reflogs
    directly. Instead, we have to iterate through all reflog entries and
    discard all those redundant entries for which we have already returned a
    reflog entry.
    
    This logic is implemented by tracking the last reflog name that we have
    emitted to the iterator's user. If the next log record has the same name
    we simply skip it until we find another record with a different refname.
    
    This last reflog name is stored in a simple C string, which requires us
    to free and reallocate it whenever we need to update the reflog name.
    Convert it to use a `struct strbuf` instead, which reduces the number of
    allocations. Before:
    
        HEAP SUMMARY:
            in use at exit: 13,473 bytes in 122 blocks
          total heap usage: 1,068,485 allocs, 1,068,363 frees, 281,122,886 bytes allocated
    
    After:
    
        HEAP SUMMARY:
            in use at exit: 13,473 bytes in 122 blocks
          total heap usage: 68,485 allocs, 68,363 frees, 256,234,072 bytes allocated
    
    Note that even after this change we still allocate quite a lot of data,
    even though the number of allocations does not scale with the number of
    log records anymore. This remainder comes mostly from decompressing the
    log blocks, where we decompress each block into newly allocated memory.
    This will be addressed at a later point in time.
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    fcacc2b View commit details
    Browse the repository at this point in the history