Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: git/git
base: 330ed38a2df0d67e247edc7ea69175520ead469d
Choose a base ref
...
head repository: git/git
compare: fffd981ec2d7965733a4a15f9071e3734f7654a6
Choose a head ref
  • 2 commits
  • 2 files changed
  • 1 contributor

Commits on Mar 7, 2024

  1. reftable/record: fix memory leak when decoding object records

    When decoding records it is customary to reuse a `struct
    reftable_ref_record` across calls. Thus, it may happen that the record
    already holds some allocated memory. When decoding ref and log records
    we handle this by releasing or reallocating held memory. But we fail to
    do this for object records, which causes us to leak memory.
    
    Fix this memory leak by releasing object records before we decode into
    them. We may eventually want to reuse memory instead to avoid needless
    reallocations. But for now, let's just plug the leak and be done.
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Mar 7, 2024
    Configuration menu
    Copy the full SHA
    1a03591 View commit details
    Browse the repository at this point in the history
  2. reftable/block: fix binary search over restart counter

    Records store their keys prefix-compressed. As many records will share a
    common prefix (e.g. "refs/heads/"), this can end up saving quite a bit
    of disk space. The downside of this is that it is not possible to just
    seek into the middle of a block and consume the corresponding record
    because it may depend on prefixes read from preceding records.
    
    To help with this usecase, the reftable format writes every n'th record
    without using prefix compression, which is called a "restart". The list
    of restarts is stored at the end of each block so that a reader can
    figure out entry points at which to read a full record without having to
    read all preceding records.
    
    This allows us to do a binary search over the records in a block when
    searching for a particular key by iterating through the restarts until
    we have found the section in which our record must be located. From
    thereon we perform a linear search to locate the desired record.
    
    This mechanism is broken though. In `block_reader_seek()` we call
    `binsearch()` over the count of restarts in the current block. The
    function we pass to compare records with each other computes the key at
    the current index and then compares it to our search key by calling
    `strbuf_cmp()`, returning its result directly. But `binsearch()` expects
    us to return a truish value that indicates whether the current index is
    smaller than the searched-for key. And unless our key exactly matches
    the value at the restart counter we always end up returning a truish
    value.
    
    The consequence is that `binsearch()` essentially always returns 0,
    indicacting to us that we must start searching right at the beginning of
    the block. This works by chance because we now always do a linear scan
    from the start of the block, and thus we would still end up finding the
    desired record. But needless to say, this makes the optimization quite
    useless.
    
    Fix this bug by returning whether the current key is smaller than the
    searched key. As the current behaviour was correct it is not possible to
    write a test. Furthermore it is also not really possible to demonstrate
    in a benchmark that this fix speeds up seeking records.
    
    This may cause the reader to question whether this binary search makes
    sense in the first place if it doesn't even help with performance. But
    it would end up helping if we were to read a reftable with a much larger
    block size. Blocks can be up to 16MB in size, in which case it will
    become much more important to avoid the linear scan. We are not yet
    ready to read or write such larger blocks though, so we have to live
    without a benchmark demonstrating this.
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Mar 7, 2024
    Configuration menu
    Copy the full SHA
    fffd981 View commit details
    Browse the repository at this point in the history