Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/multiregionccl: TestMultiRegionDataDriven failed #124359

Open
github-actions bot opened this issue May 17, 2024 · 5 comments
Open

ccl/multiregionccl: TestMultiRegionDataDriven failed #124359

github-actions bot opened this issue May 17, 2024 · 5 comments
Assignees
Labels
branch-release-24.1 Used to mark GA and release blockers and technical advisories for 24.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Milestone

Comments

@github-actions
Copy link

github-actions bot commented May 17, 2024

ccl/multiregionccl.TestMultiRegionDataDriven failed on release-24.1 @ bf5e0e7cafb7537cecab2f70aafd1b4274172e86:

             0.572ms      0.021ms                        event:kv/kvclient/kvcoord/dist_sender.go:2637 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] r69: sending batch 1 Get to (n6,s6):10NON_VOTER
             0.583ms      0.011ms                        event:rpc/nodedialer/nodedialer.go:138 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] sending request to local client
             0.591ms      0.008ms                        event:kv/kvclient/kvcoord/transport.go:213 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] sending batch request
             0.596ms      0.004ms                            === operation:/cockroach.roachpb.Internal/Batch _verbose:1 node:6 span.kind:server request:Get [/Table/108/1/1/0], [txn: 8bc14484], [can-forward-ts], [max_span_request_keys: 0], [target_bytes: 10485760]
             0.602ms      0.006ms                            event:server/node.go:1402 [n6] node received request: 1 Get
             0.625ms      0.023ms                            event:kv/kvserver/store_send.go:154 [n6,s6] executing Get [/Table/108/1/1/0], [txn: 8bc14484], [can-forward-ts], [max_span_request_keys: 0], [target_bytes: 10485760]
             0.634ms      0.009ms                            event:kv/kvserver/replica_send.go:180 [n6,s6,r69/10:/{Table/108-Max}] read-only path
             0.643ms      0.009ms                            event:kv/kvserver/concurrency/concurrency_manager.go:214 [n6,s6,r69/10:/{Table/108-Max}] sequencing request
             0.649ms      0.006ms                            event:kv/kvserver/concurrency/concurrency_manager.go:295 [n6,s6,r69/10:/{Table/108-Max}] acquiring latches
             0.657ms      0.008ms                            event:kv/kvserver/concurrency/concurrency_manager.go:339 [n6,s6,r69/10:/{Table/108-Max}] scanning lock table for conflicting locks
             0.689ms      0.032ms                            event:kv/kvserver/replica_follower_read.go:118 [n6,s6,r69/10:/{Table/108-Max}] can't serve follower read; closed timestamp too low by: 1.07062859s; maxClosed: 1715970537.093035200,0 ts: 1715970537.663663790,0 uncertaintyLimit: 1715970538.163663790,0
             0.744ms      0.055ms                            event:kv/kvserver/replica_send.go:199 [n6,s6,r69/10:/{Table/108-Max}] replica.Send got error: [NotLeaseHolderError] lease held by different store; r69: replica (n6,s6):10NON_VOTER not lease holder; current lease is repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1715970506.521614979,0
             0.783ms      0.040ms                            event:server/node.go:1459 [n6] error from stores.Send: [NotLeaseHolderError] lease held by different store; r69: replica (n6,s6):10NON_VOTER not lease holder; current lease is repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1715970506.521614979,0: "sql txn" meta={id=8bc14484 key=/Min iso=Serializable pri=0.00806334 epo=0 ts=1715970537.663663790,0 min=1715970537.663663790,0 seq=0} lock=false stat=PENDING rts=1715970537.663663790,0 wto=false gul=1715970538.163663790,0
             0.792ms      0.009ms                            event:server/node.go:1496 [n6] node sending response
             0.820ms      0.229ms                        event:kv/kvclient/kvcoord/transport.go:215 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] received batch response
             0.861ms      0.041ms                        event:kv/kvclient/kvcoord/dist_sender.go:2866 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] [NotLeaseHolderError] lease held by different store; r69: replica (n6,s6):10NON_VOTER not lease holder; current lease is repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1715970506.521614979,0: "sql txn" meta={id=8bc14484 key=/Min iso=Serializable pri=0.00806334 epo=0 ts=1715970537.663663790,0 min=1715970537.663663790,0 seq=0} lock=false stat=PENDING rts=1715970537.663663790,0 wto=false gul=1715970538.163663790,0
             0.903ms      0.042ms                        event:kv/kvclient/kvcoord/dist_sender.go:3115 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] application error: [NotLeaseHolderError] lease held by different store; r69: replica (n6,s6):10NON_VOTER not lease holder; current lease is repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1715970506.521614979,0: "sql txn" meta={id=8bc14484 key=/Min iso=Serializable pri=0.00806334 epo=0 ts=1715970537.663663790,0 min=1715970537.663663790,0 seq=0} lock=false stat=PENDING rts=1715970537.663663790,0 wto=false gul=1715970538.163663790,0
             0.919ms      0.016ms                        event:kv/kvclient/kvcoord/dist_sender.go:2640 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] trying next peer (n1,s1):1
             0.930ms      0.011ms                        event:rpc/nodedialer/nodedialer.go:147 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] sending request to 127.0.0.1:41551
             0.940ms      0.010ms                        event:kv/kvclient/kvcoord/transport.go:213 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] sending batch request
             0.942ms      0.002ms                            === operation:/cockroach.roachpb.Internal/Batch _verbose:1 node:6 client:127.0.0.1:54446 hostssl: user:root txn:8bc14484 span.kind:client
             1.053ms      0.111ms                            === operation:/cockroach.roachpb.Internal/Batch _verbose:1 node:1 span.kind:server request:Get [/Table/108/1/1/0], [txn: 8bc14484], [can-forward-ts], [max_span_request_keys: 0], [target_bytes: 10485760]
             1.064ms      0.012ms                            event:server/node.go:1402 [n1] node received request: 1 Get
             1.086ms      0.022ms                            event:kv/kvserver/store_send.go:154 [n1,s1] executing Get [/Table/108/1/1/0], [txn: 8bc14484], [can-forward-ts], [max_span_request_keys: 0], [target_bytes: 10485760]
             1.094ms      0.008ms                            event:kv/kvserver/replica_send.go:180 [n1,s1,r69/1:/{Table/108-Max}] read-only path
             1.103ms      0.009ms                            event:kv/kvserver/concurrency/concurrency_manager.go:214 [n1,s1,r69/1:/{Table/108-Max}] sequencing request
             1.109ms      0.006ms                            event:kv/kvserver/concurrency/concurrency_manager.go:295 [n1,s1,r69/1:/{Table/108-Max}] acquiring latches
             1.117ms      0.007ms                            event:kv/kvserver/concurrency/concurrency_manager.go:339 [n1,s1,r69/1:/{Table/108-Max}] scanning lock table for conflicting locks
             1.139ms      0.022ms                            event:kv/kvserver/replica_read.go:312 [n1,s1,r69/1:/{Table/108-Max}] can drop latches early for batch (Get [/Table/108/1/1/0], [txn: 8bc14484], [can-forward-ts], [max_span_request_keys: 0], [target_bytes: 10485760]); scanning lock table first to detect conflicts
             1.160ms      0.021ms                            event:kv/kvserver/replica_read.go:124 [n1,s1,r69/1:/{Table/108-Max}] lock table scan complete without conflicts; dropping latches early
             1.171ms      0.010ms                            event:kv/kvserver/replica_read.go:425 [n1,s1,r69/1:/{Table/108-Max}] executing read-only batch
             1.222ms      0.051ms                            event:kv/kvserver/replica_evaluate.go:512 [n1,s1,r69/1:/{Table/108-Max}] evaluated Get command header:<key:"\364\211\211\210" > , txn="sql txn" meta={id=8bc14484 key=/Min iso=Serializable pri=0.00806334 epo=0 ts=1715970537.663663790,0 min=1715970537.663663790,0 seq=0} lock=false stat=PENDING rts=1715970537.663663790,0 wto=false gul=1715970538.163663790,0 : resp=header:<num_keys:1 num_bytes:7 > value:<raw_bytes:"\375\311[\010\n#\002" timestamp:<wall_time:1715970514451360965 > > , err=<nil>
             1.241ms      0.019ms                            event:scan stats: stepped 0 times (0 internal); seeked 1 times (1 internal); block-bytes: (total 8.3 KiB, cached 8.3 KiB, duration 0s); points: (count 1, key-bytes 14 B, value-bytes 7 B, tombstoned: 0) ranges: (count 0), (contained-points 0, skipped-points 0) evaluated requests: 1 gets, 0 scans, 0 reverse scans
             1.250ms      0.009ms                            event:kv/kvserver/replica_read.go:237 [n1,s1,r69/1:/{Table/108-Max}] read completed
             1.264ms      0.014ms                            event:server/node.go:1496 [n1] node sending response
             1.225ms     -0.039ms                            structured:{"@type":"type.googleapis.com/cockroach.roachpb.ScanStats","numInterfaceSeeks":"1","numInternalSeeks":"1","blockBytes":"8515","blockBytesInCache":"8515","keyBytes":"14","valueBytes":"7","pointCount":"1","numGets":"1"}
             1.434ms      0.495ms                        event:kv/kvclient/kvcoord/transport.go:215 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] received batch response
             1.471ms      0.986ms                    event:kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:661 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] recording span to refresh: /Table/108/1/1/0
             1.518ms      1.060ms            event:ComponentStats{ID: {764b638f-ca96-40fb-9c13-fd51ccf1e08e PROCESSOR 0 6 }, KV time: 1ms, KV contention time: 0µs, KV rows decoded: 1, KV bytes read: 11 B, KV gRPC calls: 1, KV pairs read: 1, MVCC step count (ext/int): 0/0, MVCC seek count (ext/int): 1/1, max memory allocated: 20 KiB, sql cpu time: 12µs, batches output: 1, rows output: 1}
             1.525ms      0.007ms            event:ComponentStats{ID: {00000000-0000-0000-0000-000000000000 UNSET 0 ? }}
             1.497ms     -0.028ms            structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"764b638f-ca96-40fb-9c13-fd51ccf1e08e","type":"PROCESSOR","sqlInstanceId":6},"netRx":{"latency":{},"waitTime":{},"deserializationTime":{},"tuplesReceived":{},"bytesReceived":{},"messagesReceived":{}},"netTx":{"tuplesSent":{},"bytesSent":{},"messagesSent":{}},"kv":{"bytesRead":{"valuePlusOne":"12"},"kvPairsRead":{"valuePlusOne":"2"},"tuplesRead":{"valuePlusOne":"2"},"batchRequestsIssued":{"valuePlusOne":"2"},"kvTime":{"valuePlusOne":"0.001029478s"},"contentionTime":{"valuePlusOne":"0.000000001s"},"numInterfaceSteps":{"valuePlusOne":"1"},"numInternalSteps":{"valuePlusOne":"1"},"numInterfaceSeeks":{"valuePlusOne":"2"},"numInternalSeeks":{"valuePlusOne":"2"},"blockBytes":{"valuePlusOne":"8516"},"blockBytesInCache":{"valuePlusOne":"8516"},"keyBytes":{"valuePlusOne":"15"},"valueBytes":{"valuePlusOne":"8"},"pointCount":{"valuePlusOne":"2"},"pointsCoveredByRangeTombstones":{"valuePlusOne":"1"},"rangeKeyCount":{"valuePlusOne":"1"},"rangeKeyContainedPoints":{"valuePlusOne":"1"},"rangeKeySkippedPoints":{"valuePlusOne":"1"},"kvCpuTime":{},"numGets":{"valuePlusOne":"2"},"numScans":{"valuePlusOne":"1"},"numReverseScans":{"valuePlusOne":"1"}},"exec":{"execTime":{},"maxAllocatedMem":{"valuePlusOne":"20481"},"maxAllocatedDisk":{},"consumedRu":{"valuePlusOne":"1"},"cpuTime":{"valuePlusOne":"0.000011891s"}},"output":{"numBatches":{"valuePlusOne":"2"},"numTuples":{"valuePlusOne":"2"}},"flowStats":{"maxMemUsage":{},"maxDiskUsage":{},"consumedRu":{}}}
             1.520ms      0.023ms            structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"00000000-0000-0000-0000-000000000000"},"netRx":{"latency":{},"waitTime":{},"deserializationTime":{},"tuplesReceived":{},"bytesReceived":{},"messagesReceived":{}},"netTx":{"tuplesSent":{},"bytesSent":{},"messagesSent":{}},"kv":{"bytesRead":{},"kvPairsRead":{},"tuplesRead":{},"batchRequestsIssued":{},"kvTime":{},"contentionTime":{},"numInterfaceSteps":{},"numInternalSteps":{},"numInterfaceSeeks":{},"numInternalSeeks":{},"blockBytes":{},"blockBytesInCache":{},"keyBytes":{},"valueBytes":{},"pointCount":{},"pointsCoveredByRangeTombstones":{},"rangeKeyCount":{},"rangeKeyContainedPoints":{},"rangeKeySkippedPoints":{},"kvCpuTime":{},"numGets":{},"numScans":{},"numReverseScans":{}},"exec":{"execTime":{},"maxAllocatedMem":{},"maxAllocatedDisk":{},"consumedRu":{},"cpuTime":{}},"output":{"numBatches":{},"numTuples":{}},"flowStats":{"maxMemUsage":{},"maxDiskUsage":{},"consumedRu":{}}}
             1.544ms      1.087ms        event:ComponentStats{ID: {764b638f-ca96-40fb-9c13-fd51ccf1e08e FLOW 0 6 eu-central-1}}
             1.541ms     -0.004ms        structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"764b638f-ca96-40fb-9c13-fd51ccf1e08e","type":"FLOW","sqlInstanceId":6,"region":"eu-central-1"},"netRx":{"latency":{},"waitTime":{},"deserializationTime":{},"tuplesReceived":{},"bytesReceived":{},"messagesReceived":{}},"netTx":{"tuplesSent":{},"bytesSent":{},"messagesSent":{}},"kv":{"bytesRead":{},"kvPairsRead":{},"tuplesRead":{},"batchRequestsIssued":{},"kvTime":{},"contentionTime":{},"numInterfaceSteps":{},"numInternalSteps":{},"numInterfaceSeeks":{},"numInternalSeeks":{},"blockBytes":{},"blockBytesInCache":{},"keyBytes":{},"valueBytes":{},"pointCount":{},"pointsCoveredByRangeTombstones":{},"rangeKeyCount":{},"rangeKeyContainedPoints":{},"rangeKeySkippedPoints":{},"kvCpuTime":{},"numGets":{},"numScans":{},"numReverseScans":{}},"exec":{"execTime":{},"maxAllocatedMem":{},"maxAllocatedDisk":{},"consumedRu":{},"cpuTime":{}},"output":{"numBatches":{},"numTuples":{}},"flowStats":{"maxMemUsage":{"valuePlusOne":"20481"},"maxDiskUsage":{"valuePlusOne":"1"},"consumedRu":{}}}
             1.568ms      1.160ms    event:sql/conn_executor_exec.go:1150 [n6,client=127.0.0.1:54446,hostssl,user=root] execution ends
             1.576ms      0.009ms    event:sql/conn_executor_exec.go:1150 [n6,client=127.0.0.1:54446,hostssl,user=root] rows affected: 1
             1.619ms      0.042ms        === operation:commit sql txn _verbose:1 node:6 client:127.0.0.1:54446 hostssl: user:root
             1.640ms      0.063ms    event:sql/conn_executor_exec.go:3151 [n6,client=127.0.0.1:54446,hostssl,user=root] AutoCommit. err: <nil>
        
    --- FAIL: TestMultiRegionDataDriven/global_tables (32.20s)

Parameters:

  • attempt=1
  • run=1
  • shard=2
Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-38846

@github-actions github-actions bot added branch-release-24.1 Used to mark GA and release blockers and technical advisories for 24.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels May 17, 2024
@github-actions github-actions bot added this to the 24.1 milestone May 17, 2024
@github-actions github-actions bot added this to Triage in SQL Foundations May 17, 2024
@rafiss
Copy link
Collaborator

rafiss commented May 21, 2024

        /var/lib/engflow/worker/work/0/exec/bazel-out/k8-fastbuild/bin/pkg/ccl/multiregionccl/multiregionccl_test_/multiregionccl_test.runfiles/com_github_cockroachdb_cockroach/pkg/ccl/multiregionccl/testdata/global_tables:98:
         SELECT * FROM db.global WHERE k = 1
        expected:
        served locally: true
        served via follower read: true
        
        found:
        not a match, trace:
             0.000ms      0.000ms    === operation:traced statement _unfinished:1 _verbose:1 node:6 client:127.0.0.1:54446 hostssl: user:root
             0.000ms      0.000ms    [flow: {count: 1, duration 1ms}]
             0.000ms      0.000ms    [colbatchscan: {count: 1, duration 1ms, unfinished}]
             0.000ms      0.000ms    [batch flow coordinator: {count: 1, duration 1ms}]
             0.000ms      0.000ms    [txn coordinator send: {count: 1, duration 989µs}]
             0.000ms      0.000ms    [dist sender send: {count: 1, duration 953µs}]
             0.000ms      0.000ms    [/cockroach.roachpb.Internal/Batch: {count: 3, duration 875µs}]
             0.000ms      0.000ms    [optimizer: {count: 1, duration 316µs}]
             0.000ms      0.000ms    [commit sql txn: {count: 1, duration 5µs}]
             0.004ms      0.004ms        === operation:optimizer _verbose:1 node:6 client:127.0.0.1:54446 hostssl: user:root
             0.036ms      0.033ms        event:sql/catalog/lease/descriptor_version_state.go:143 [n6,client=127.0.0.1:54446,hostssl,user=root] descriptorVersionState.incRefCount: 100("defaultdb",010180c382fe9b908b44c89850dce01bd13795) ver=1:1715970831.053132266,0, refcount=1
             0.049ms      0.013ms        event:sql/catalog/descs/descriptor.go:137 [n6,client=127.0.0.1:54446,hostssl,user=root] looking up descriptors for ids [100]
             0.066ms      0.017ms        event:sql/catalog/lease/descriptor_version_state.go:143 [n6,client=127.0.0.1:54446,hostssl,user=root] descriptorVersionState.incRefCount: 104("db",010180c382fe9b908b44c89850dce01bd13795) ver=1:1715970833.691733341,0, refcount=1
             0.075ms      0.009ms        event:sql/catalog/descs/descriptor.go:137 [n6,client=127.0.0.1:54446,hostssl,user=root] looking up descriptors for ids [104]
             0.083ms      0.009ms        event:sql/catalog/descs/descriptor.go:137 [n6,client=127.0.0.1:54446,hostssl,user=root] looking up descriptors for ids [105]
             0.095ms      0.012ms        event:sql/catalog/lease/descriptor_version_state.go:143 [n6,client=127.0.0.1:54446,hostssl,user=root] descriptorVersionState.incRefCount: 105("public",010180c382fe9b908b44c89850dce01bd13795) ver=1:1715970820.557938955,0, refcount=1
             0.107ms      0.012ms        event:sql/catalog/lease/descriptor_version_state.go:143 [n6,client=127.0.0.1:54446,hostssl,user=root] descriptorVersionState.incRefCount: 108("global",010180c382fe9b908b44c89850dce01bd13795) ver=1:1715970842.814311713,0, refcount=1
             0.115ms      0.008ms        event:sql/catalog/descs/descriptor.go:137 [n6,client=127.0.0.1:54446,hostssl,user=root] looking up descriptors for ids [108]
             0.129ms      0.014ms        event:sql/plan_opt.go:406 [n6,client=127.0.0.1:54446,hostssl,user=root] query cache hit but needed update
             0.137ms      0.008ms        event:sql/opt/optbuilder/builder.go:217 [n6,client=127.0.0.1:54446,hostssl,user=root] optbuilder start
             0.147ms      0.010ms        event:sql/catalog/descs/descriptor.go:137 [n6,client=127.0.0.1:54446,hostssl,user=root] looking up descriptors for ids [100]
             0.156ms      0.009ms        event:sql/catalog/descs/descriptor.go:137 [n6,client=127.0.0.1:54446,hostssl,user=root] looking up descriptors for ids [104]
             0.163ms      0.008ms        event:sql/catalog/descs/descriptor.go:137 [n6,client=127.0.0.1:54446,hostssl,user=root] looking up descriptors for ids [105]
             0.171ms      0.008ms        event:sql/catalog/descs/descriptor.go:137 [n6,client=127.0.0.1:54446,hostssl,user=root] looking up descriptors for ids [108]
             0.238ms      0.067ms        event:sql/opt/optbuilder/builder.go:263 [n6,client=127.0.0.1:54446,hostssl,user=root] optbuilder finish
             0.246ms      0.009ms        event:sql/plan_opt.go:406 [n6,client=127.0.0.1:54446,hostssl,user=root] optimizing (no placeholders)
             0.253ms      0.007ms        event:sql/opt/xform/optimizer.go:241 [n6,client=127.0.0.1:54446,hostssl,user=root] optimize start
             0.291ms      0.038ms        event:sql/opt/xform/optimizer.go:290 [n6,client=127.0.0.1:54446,hostssl,user=root] optimize finish
             0.342ms      0.342ms    event:sql/conn_executor_exec.go:1150 [n6,client=127.0.0.1:54446,hostssl,user=root] planning ends
             0.354ms      0.012ms    event:sql/conn_executor_exec.go:1150 [n6,client=127.0.0.1:54446,hostssl,user=root] checking distributability
             0.365ms      0.011ms    event:sql/conn_executor_exec.go:1150 [n6,client=127.0.0.1:54446,hostssl,user=root] will distribute plan: false
             0.375ms      0.011ms    event:sql/conn_executor_exec.go:1150 [n6,client=127.0.0.1:54446,hostssl,user=root] executing after 0 retries, last retry reason: <nil>
             0.386ms      0.010ms    event:sql/conn_executor_exec.go:1150 [n6,client=127.0.0.1:54446,hostssl,user=root] execution starts: distributed engine
             0.396ms      0.010ms    event:sql/distsql_running.go:1979 [n6,client=127.0.0.1:54446,hostssl,user=root] creating DistSQL plan with isLocal=true
             0.407ms      0.012ms    event:sql/distsql_running.go:835 [n6,client=127.0.0.1:54446,hostssl,user=root] running DistSQL plan
             0.410ms      0.003ms        === operation:flow _verbose:1 node:6 client:127.0.0.1:54446 hostssl: user:root
             0.410ms      0.000ms        [colbatchscan: {count: 1, duration 1ms, unfinished}]
             0.410ms      0.000ms        [batch flow coordinator: {count: 1, duration 1ms}]
             0.410ms      0.000ms        [txn coordinator send: {count: 1, duration 989µs}]
             0.410ms      0.000ms        [dist sender send: {count: 1, duration 953µs}]
             0.410ms      0.000ms        [/cockroach.roachpb.Internal/Batch: {count: 3, duration 875µs}]
             0.423ms      0.013ms        event:sql/colflow/vectorized_flow.go:231 [n6,client=127.0.0.1:54446,hostssl,user=root] setting up vectorized flow
             0.439ms      0.016ms        event:sql/colflow/vectorized_flow.go:272 [n6,client=127.0.0.1:54446,hostssl,user=root] vectorized flow setup succeeded
             0.450ms      0.011ms        event:sql/flowinfra/flow.go:465 [n6,client=127.0.0.1:54446,hostssl,user=root] starting (0 processors, 0 startables) asynchronously
             0.458ms      0.007ms        event:sql/colflow/vectorized_flow.go:319 [n6,client=127.0.0.1:54446,hostssl,user=root] running the batch flow coordinator in the flow's goroutine
             0.458ms      0.000ms            === operation:batch flow coordinator _verbose:1 node:6 client:127.0.0.1:54446 hostssl: user:root cockroach.flowid:764b638f-ca96-40fb-9c13-fd51ccf1e08e cockroach.processorid:0
             0.458ms      0.000ms            [colbatchscan: {count: 1, duration 1ms, unfinished}]
             0.458ms      0.000ms            [txn coordinator send: {count: 1, duration 989µs}]
             0.458ms      0.000ms            [dist sender send: {count: 1, duration 953µs}]
             0.458ms      0.000ms            [/cockroach.roachpb.Internal/Batch: {count: 3, duration 875µs}]
             0.461ms      0.002ms                === operation:colbatchscan _unfinished:1 _verbose:1 node:6 client:127.0.0.1:54446 hostssl: user:root cockroach.flowid:764b638f-ca96-40fb-9c13-fd51ccf1e08e cockroach.processorid:0
             0.461ms      0.000ms                [txn coordinator send: {count: 1, duration 989µs}]
             0.461ms      0.000ms                [dist sender send: {count: 1, duration 953µs}]
             0.461ms      0.000ms                [/cockroach.roachpb.Internal/Batch: {count: 3, duration 875µs}]
             0.483ms      0.022ms                event:sql/row/kv_batch_fetcher.go:535 [n6,client=127.0.0.1:54446,hostssl,user=root] Scan /Table/108/1/1/0
             0.485ms      0.002ms                    === operation:txn coordinator send _verbose:1 node:6 client:127.0.0.1:54446 hostssl: user:root txnID:8bc14484-d4f1-4ed8-858e-f8d1fb9bb311
             0.485ms      0.000ms                    [dist sender send: {count: 1, duration 953µs}]
             0.485ms      0.000ms                    [/cockroach.roachpb.Internal/Batch: {count: 3, duration 875µs}]
             0.490ms      0.005ms                        === operation:dist sender send _verbose:1 node:6 client:127.0.0.1:54446 hostssl: user:root txn:8bc14484
             0.490ms      0.000ms                        [/cockroach.roachpb.Internal/Batch: {count: 3, duration 875µs}]
             0.507ms      0.017ms                        event:kv/kvclient/kvcoord/range_iter.go:188 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] querying next range at /Table/108/1/1/0
             0.538ms      0.031ms                        event:kv/kvclient/kvcoord/range_iter.go:225 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] key: /Table/108/1/1/0, desc: r69:/{Table/108-Max} [(n1,s1):1, (n3,s3):5, (n2,s2):8, (n5,s5):6NON_VOTER, (n4,s4):9NON_VOTER, (n6,s6):10NON_VOTER, next=11, gen=24]
             0.551ms      0.013ms                        event:kv/kvclient/kvcoord/dist_sender.go:2574 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] routing to nearest replica; leaseholder not required
             0.572ms      0.021ms                        event:kv/kvclient/kvcoord/dist_sender.go:2637 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] r69: sending batch 1 Get to (n6,s6):10NON_VOTER
             0.583ms      0.011ms                        event:rpc/nodedialer/nodedialer.go:138 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] sending request to local client
             0.591ms      0.008ms                        event:kv/kvclient/kvcoord/transport.go:213 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] sending batch request
             0.596ms      0.004ms                            === operation:/cockroach.roachpb.Internal/Batch _verbose:1 node:6 span.kind:server request:Get [/Table/108/1/1/0], [txn: 8bc14484], [can-forward-ts], [max_span_request_keys: 0], [target_bytes: 10485760]
             0.602ms      0.006ms                            event:server/node.go:1402 [n6] node received request: 1 Get
             0.625ms      0.023ms                            event:kv/kvserver/store_send.go:154 [n6,s6] executing Get [/Table/108/1/1/0], [txn: 8bc14484], [can-forward-ts], [max_span_request_keys: 0], [target_bytes: 10485760]
             0.634ms      0.009ms                            event:kv/kvserver/replica_send.go:180 [n6,s6,r69/10:/{Table/108-Max}] read-only path
             0.643ms      0.009ms                            event:kv/kvserver/concurrency/concurrency_manager.go:214 [n6,s6,r69/10:/{Table/108-Max}] sequencing request
             0.649ms      0.006ms                            event:kv/kvserver/concurrency/concurrency_manager.go:295 [n6,s6,r69/10:/{Table/108-Max}] acquiring latches
             0.657ms      0.008ms                            event:kv/kvserver/concurrency/concurrency_manager.go:339 [n6,s6,r69/10:/{Table/108-Max}] scanning lock table for conflicting locks
             0.689ms      0.032ms                            event:kv/kvserver/replica_follower_read.go:118 [n6,s6,r69/10:/{Table/108-Max}] can't serve follower read; closed timestamp too low by: 1.07062859s; maxClosed: 1715970537.093035200,0 ts: 1715970537.663663790,0 uncertaintyLimit: 1715970538.163663790,0
             0.744ms      0.055ms                            event:kv/kvserver/replica_send.go:199 [n6,s6,r69/10:/{Table/108-Max}] replica.Send got error: [NotLeaseHolderError] lease held by different store; r69: replica (n6,s6):10NON_VOTER not lease holder; current lease is repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1715970506.521614979,0
             0.783ms      0.040ms                            event:server/node.go:1459 [n6] error from stores.Send: [NotLeaseHolderError] lease held by different store; r69: replica (n6,s6):10NON_VOTER not lease holder; current lease is repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1715970506.521614979,0: "sql txn" meta={id=8bc14484 key=/Min iso=Serializable pri=0.00806334 epo=0 ts=1715970537.663663790,0 min=1715970537.663663790,0 seq=0} lock=false stat=PENDING rts=1715970537.663663790,0 wto=false gul=1715970538.163663790,0
             0.792ms      0.009ms                            event:server/node.go:1496 [n6] node sending response
             0.820ms      0.229ms                        event:kv/kvclient/kvcoord/transport.go:215 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] received batch response
             0.861ms      0.041ms                        event:kv/kvclient/kvcoord/dist_sender.go:2866 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] [NotLeaseHolderError] lease held by different store; r69: replica (n6,s6):10NON_VOTER not lease holder; current lease is repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1715970506.521614979,0: "sql txn" meta={id=8bc14484 key=/Min iso=Serializable pri=0.00806334 epo=0 ts=1715970537.663663790,0 min=1715970537.663663790,0 seq=0} lock=false stat=PENDING rts=1715970537.663663790,0 wto=false gul=1715970538.163663790,0
             0.903ms      0.042ms                        event:kv/kvclient/kvcoord/dist_sender.go:3115 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] application error: [NotLeaseHolderError] lease held by different store; r69: replica (n6,s6):10NON_VOTER not lease holder; current lease is repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1715970506.521614979,0: "sql txn" meta={id=8bc14484 key=/Min iso=Serializable pri=0.00806334 epo=0 ts=1715970537.663663790,0 min=1715970537.663663790,0 seq=0} lock=false stat=PENDING rts=1715970537.663663790,0 wto=false gul=1715970538.163663790,0
             0.919ms      0.016ms                        event:kv/kvclient/kvcoord/dist_sender.go:2640 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] trying next peer (n1,s1):1
             0.930ms      0.011ms                        event:rpc/nodedialer/nodedialer.go:147 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] sending request to 127.0.0.1:41551
             0.940ms      0.010ms                        event:kv/kvclient/kvcoord/transport.go:213 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] sending batch request
             0.942ms      0.002ms                            === operation:/cockroach.roachpb.Internal/Batch _verbose:1 node:6 client:127.0.0.1:54446 hostssl: user:root txn:8bc14484 span.kind:client
             1.053ms      0.111ms                            === operation:/cockroach.roachpb.Internal/Batch _verbose:1 node:1 span.kind:server request:Get [/Table/108/1/1/0], [txn: 8bc14484], [can-forward-ts], [max_span_request_keys: 0], [target_bytes: 10485760]
             1.064ms      0.012ms                            event:server/node.go:1402 [n1] node received request: 1 Get
             1.086ms      0.022ms                            event:kv/kvserver/store_send.go:154 [n1,s1] executing Get [/Table/108/1/1/0], [txn: 8bc14484], [can-forward-ts], [max_span_request_keys: 0], [target_bytes: 10485760]
             1.094ms      0.008ms                            event:kv/kvserver/replica_send.go:180 [n1,s1,r69/1:/{Table/108-Max}] read-only path
             1.103ms      0.009ms                            event:kv/kvserver/concurrency/concurrency_manager.go:214 [n1,s1,r69/1:/{Table/108-Max}] sequencing request
             1.109ms      0.006ms                            event:kv/kvserver/concurrency/concurrency_manager.go:295 [n1,s1,r69/1:/{Table/108-Max}] acquiring latches
             1.117ms      0.007ms                            event:kv/kvserver/concurrency/concurrency_manager.go:339 [n1,s1,r69/1:/{Table/108-Max}] scanning lock table for conflicting locks
             1.139ms      0.022ms                            event:kv/kvserver/replica_read.go:312 [n1,s1,r69/1:/{Table/108-Max}] can drop latches early for batch (Get [/Table/108/1/1/0], [txn: 8bc14484], [can-forward-ts], [max_span_request_keys: 0], [target_bytes: 10485760]); scanning lock table first to detect conflicts
             1.160ms      0.021ms                            event:kv/kvserver/replica_read.go:124 [n1,s1,r69/1:/{Table/108-Max}] lock table scan complete without conflicts; dropping latches early
             1.171ms      0.010ms                            event:kv/kvserver/replica_read.go:425 [n1,s1,r69/1:/{Table/108-Max}] executing read-only batch
             1.222ms      0.051ms                            event:kv/kvserver/replica_evaluate.go:512 [n1,s1,r69/1:/{Table/108-Max}] evaluated Get command header:<key:"\364\211\211\210" > , txn="sql txn" meta={id=8bc14484 key=/Min iso=Serializable pri=0.00806334 epo=0 ts=1715970537.663663790,0 min=1715970537.663663790,0 seq=0} lock=false stat=PENDING rts=1715970537.663663790,0 wto=false gul=1715970538.163663790,0 : resp=header:<num_keys:1 num_bytes:7 > value:<raw_bytes:"\375\311[\010\n#\002" timestamp:<wall_time:1715970514451360965 > > , err=<nil>
             1.241ms      0.019ms                            event:scan stats: stepped 0 times (0 internal); seeked 1 times (1 internal); block-bytes: (total 8.3 KiB, cached 8.3 KiB, duration 0s); points: (count 1, key-bytes 14 B, value-bytes 7 B, tombstoned: 0) ranges: (count 0), (contained-points 0, skipped-points 0) evaluated requests: 1 gets, 0 scans, 0 reverse scans
             1.250ms      0.009ms                            event:kv/kvserver/replica_read.go:237 [n1,s1,r69/1:/{Table/108-Max}] read completed
             1.264ms      0.014ms                            event:server/node.go:1496 [n1] node sending response
             1.225ms     -0.039ms                            structured:{"@type":"type.googleapis.com/cockroach.roachpb.ScanStats","numInterfaceSeeks":"1","numInternalSeeks":"1","blockBytes":"8515","blockBytesInCache":"8515","keyBytes":"14","valueBytes":"7","pointCount":"1","numGets":"1"}
             1.434ms      0.495ms                        event:kv/kvclient/kvcoord/transport.go:215 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] received batch response
             1.471ms      0.986ms                    event:kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:661 [n6,client=127.0.0.1:54446,hostssl,user=root,txn=8bc14484] recording span to refresh: /Table/108/1/1/0
             1.518ms      1.060ms            event:ComponentStats{ID: {764b638f-ca96-40fb-9c13-fd51ccf1e08e PROCESSOR 0 6 }, KV time: 1ms, KV contention time: 0µs, KV rows decoded: 1, KV bytes read: 11 B, KV gRPC calls: 1, KV pairs read: 1, MVCC step count (ext/int): 0/0, MVCC seek count (ext/int): 1/1, max memory allocated: 20 KiB, sql cpu time: 12µs, batches output: 1, rows output: 1}
             1.525ms      0.007ms            event:ComponentStats{ID: {00000000-0000-0000-0000-000000000000 UNSET 0 ? }}
             1.497ms     -0.028ms            structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"764b638f-ca96-40fb-9c13-fd51ccf1e08e","type":"PROCESSOR","sqlInstanceId":6},"netRx":{"latency":{},"waitTime":{},"deserializationTime":{},"tuplesReceived":{},"bytesReceived":{},"messagesReceived":{}},"netTx":{"tuplesSent":{},"bytesSent":{},"messagesSent":{}},"kv":{"bytesRead":{"valuePlusOne":"12"},"kvPairsRead":{"valuePlusOne":"2"},"tuplesRead":{"valuePlusOne":"2"},"batchRequestsIssued":{"valuePlusOne":"2"},"kvTime":{"valuePlusOne":"0.001029478s"},"contentionTime":{"valuePlusOne":"0.000000001s"},"numInterfaceSteps":{"valuePlusOne":"1"},"numInternalSteps":{"valuePlusOne":"1"},"numInterfaceSeeks":{"valuePlusOne":"2"},"numInternalSeeks":{"valuePlusOne":"2"},"blockBytes":{"valuePlusOne":"8516"},"blockBytesInCache":{"valuePlusOne":"8516"},"keyBytes":{"valuePlusOne":"15"},"valueBytes":{"valuePlusOne":"8"},"pointCount":{"valuePlusOne":"2"},"pointsCoveredByRangeTombstones":{"valuePlusOne":"1"},"rangeKeyCount":{"valuePlusOne":"1"},"rangeKeyContainedPoints":{"valuePlusOne":"1"},"rangeKeySkippedPoints":{"valuePlusOne":"1"},"kvCpuTime":{},"numGets":{"valuePlusOne":"2"},"numScans":{"valuePlusOne":"1"},"numReverseScans":{"valuePlusOne":"1"}},"exec":{"execTime":{},"maxAllocatedMem":{"valuePlusOne":"20481"},"maxAllocatedDisk":{},"consumedRu":{"valuePlusOne":"1"},"cpuTime":{"valuePlusOne":"0.000011891s"}},"output":{"numBatches":{"valuePlusOne":"2"},"numTuples":{"valuePlusOne":"2"}},"flowStats":{"maxMemUsage":{},"maxDiskUsage":{},"consumedRu":{}}}
             1.520ms      0.023ms            structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"00000000-0000-0000-0000-000000000000"},"netRx":{"latency":{},"waitTime":{},"deserializationTime":{},"tuplesReceived":{},"bytesReceived":{},"messagesReceived":{}},"netTx":{"tuplesSent":{},"bytesSent":{},"messagesSent":{}},"kv":{"bytesRead":{},"kvPairsRead":{},"tuplesRead":{},"batchRequestsIssued":{},"kvTime":{},"contentionTime":{},"numInterfaceSteps":{},"numInternalSteps":{},"numInterfaceSeeks":{},"numInternalSeeks":{},"blockBytes":{},"blockBytesInCache":{},"keyBytes":{},"valueBytes":{},"pointCount":{},"pointsCoveredByRangeTombstones":{},"rangeKeyCount":{},"rangeKeyContainedPoints":{},"rangeKeySkippedPoints":{},"kvCpuTime":{},"numGets":{},"numScans":{},"numReverseScans":{}},"exec":{"execTime":{},"maxAllocatedMem":{},"maxAllocatedDisk":{},"consumedRu":{},"cpuTime":{}},"output":{"numBatches":{},"numTuples":{}},"flowStats":{"maxMemUsage":{},"maxDiskUsage":{},"consumedRu":{}}}
             1.544ms      1.087ms        event:ComponentStats{ID: {764b638f-ca96-40fb-9c13-fd51ccf1e08e FLOW 0 6 eu-central-1}}
             1.541ms     -0.004ms        structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"764b638f-ca96-40fb-9c13-fd51ccf1e08e","type":"FLOW","sqlInstanceId":6,"region":"eu-central-1"},"netRx":{"latency":{},"waitTime":{},"deserializationTime":{},"tuplesReceived":{},"bytesReceived":{},"messagesReceived":{}},"netTx":{"tuplesSent":{},"bytesSent":{},"messagesSent":{}},"kv":{"bytesRead":{},"kvPairsRead":{},"tuplesRead":{},"batchRequestsIssued":{},"kvTime":{},"contentionTime":{},"numInterfaceSteps":{},"numInternalSteps":{},"numInterfaceSeeks":{},"numInternalSeeks":{},"blockBytes":{},"blockBytesInCache":{},"keyBytes":{},"valueBytes":{},"pointCount":{},"pointsCoveredByRangeTombstones":{},"rangeKeyCount":{},"rangeKeyContainedPoints":{},"rangeKeySkippedPoints":{},"kvCpuTime":{},"numGets":{},"numScans":{},"numReverseScans":{}},"exec":{"execTime":{},"maxAllocatedMem":{},"maxAllocatedDisk":{},"consumedRu":{},"cpuTime":{}},"output":{"numBatches":{},"numTuples":{}},"flowStats":{"maxMemUsage":{"valuePlusOne":"20481"},"maxDiskUsage":{"valuePlusOne":"1"},"consumedRu":{}}}
             1.568ms      1.160ms    event:sql/conn_executor_exec.go:1150 [n6,client=127.0.0.1:54446,hostssl,user=root] execution ends
             1.576ms      0.009ms    event:sql/conn_executor_exec.go:1150 [n6,client=127.0.0.1:54446,hostssl,user=root] rows affected: 1
             1.619ms      0.042ms        === operation:commit sql txn _verbose:1 node:6 client:127.0.0.1:54446 hostssl: user:root
             1.640ms      0.063ms    event:sql/conn_executor_exec.go:3151 [n6,client=127.0.0.1:54446,hostssl,user=root] AutoCommit. err: <nil>
        

The relevant part of the the trace is here, which says a follower read couldn't be used since the closed timestamp was too low:

             0.689ms      0.032ms                            event:kv/kvserver/replica_follower_read.go:118 [n6,s6,r69/10:/{Table/108-Max}] can't serve follower read; closed timestamp too low by: 1.07062859s; maxClosed: 1715970537.093035200,0 ts: 1715970537.663663790,0 uncertaintyLimit: 1715970538.163663790,0
             0.744ms      0.055ms                            event:kv/kvserver/replica_send.go:199 [n6,s6,r69/10:/{Table/108-Max}] replica.Send got error: [NotLeaseHolderError] lease held by different store; r69: replica (n6,s6):10NON_VOTER not lease holder; current lease is repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1715970506.521614979,0
             0.783ms      0.040ms                            event:server/node.go:1459 [n6] error from stores.Send: [NotLeaseHolderError] lease held by different store; r69: replica (n6,s6):10NON_VOTER not lease holder; current lease is repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1715970506.521614979,0: "sql txn" meta={id=8bc14484 key=/Min iso=Serializable pri=0.00806334 epo=0 ts=1715970537.663663790,0 min=1715970537.663663790,0 seq=0} lock=false stat=PENDING rts=1715970537.663663790,0 wto=false gul=1715970538.163663790,0
             0.792ms      0.009ms                            event:server/node.go:1496 [n6] node sending response

Right before this, the test verifies that the correct closed timestamp policy is being used.

wait-for-zone-config-changes db-name=db table-name=global num-voters=3 num-non-voters=3 leaseholder=0 non-voter=5
----

refresh-range-descriptor-cache idx=5 table-name=global
SELECT * FROM db.global WHERE k = 2
----
LEAD_FOR_GLOBAL_READS

Checking if KV has any further insight on what we could do here, or if this might just be a flake. (If there's a fix needed on the SQL side, then feel free to point me in the right direction and move it back to our team.)

@rafiss rafiss added T-kv KV Team and removed T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels May 21, 2024
@blathers-crl blathers-crl bot added this to Incoming in KV May 21, 2024
@miraradeva
Copy link
Contributor

@rafiss I think we've seen a similar issue here: #115067. It seems like setting kv.closed_timestamp.propagation_slack to values less than the default of 1s has lead to flakiness. This test (TestMultiRegionDataDriven) sets it to 0.5s, which is not as bad but it might be worth trying with the default 1s.

@miraradeva miraradeva added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label May 24, 2024
@rafiss rafiss removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label May 28, 2024
@rafiss rafiss self-assigned this May 28, 2024
@exalate-issue-sync exalate-issue-sync bot removed the T-kv KV Team label May 28, 2024
@rafiss
Copy link
Collaborator

rafiss commented May 28, 2024

@kvoli - i see that you updated kv.closed_timestamp.propagation_slack to be 0.5s in 02f1666. Do you think using the default of 1s is a way to reduce flakiness, as per the above comment?

@kvoli
Copy link
Collaborator

kvoli commented May 28, 2024

@kvoli - i see that you updated kv.closed_timestamp.propagation_slack to be 0.5s in 02f1666. Do you think using the default of 1s is a way to reduce flakiness, as per the above comment?

It was not updated to be 0.5s in that commit, it seems to have been 0.5s since inception, or at least b50d297.

It doesn't seem like the setting is relevant for LEAD_FOR_GLOBAL_READS:

// the default LAG_BY_CLUSTER_SETTING closed timestamp policy.
func getFollowerReadLag(st *cluster.Settings) time.Duration {
targetDuration := closedts.TargetDuration.Get(&st.SV)
sideTransportInterval := closedts.SideTransportCloseInterval.Get(&st.SV)
slack := ClosedTimestampPropagationSlack.Get(&st.SV)

@rafiss
Copy link
Collaborator

rafiss commented May 29, 2024

Thanks for finding both of those. It looks like the max clock offset is used for global reads:

// getGlobalReadsLead returns the (positive) offset duration from hlc.Now()
// which clients can expect followers of a range with the LEAD_FOR_GLOBAL_READS
// closed timestamp policy to have closed off. This offset is equal to the
// maximum clock offset, allowing present-time (i.e. those not pushed into the
// future) transactions to serve reads from followers.
func getGlobalReadsLead(clock *hlc.Clock) time.Duration {
return clock.MaxOffset()
}

Does this mean it's just a clock issue, and there's nothing else we can configure in this test that would help?


I dug in a bit further into the KV code where the can't serve follower read message came from.

requiredFrontier := ba.RequiredFrontier()
maxClosed := r.getCurrentClosedTimestampLocked(ctx, requiredFrontier /* sufficient */)
canServeFollowerRead := requiredFrontier.LessEq(maxClosed)
tsDiff := requiredFrontier.GoTime().Sub(maxClosed.GoTime())
if !canServeFollowerRead {
uncertaintyLimitStr := "n/a"
if ba.Txn != nil {
uncertaintyLimitStr = ba.Txn.GlobalUncertaintyLimit.String()
}
// We can't actually serve the read based on the closed timestamp.
// Signal the clients that we want an update so that future requests can succeed.
log.Eventf(ctx, "can't serve follower read; closed timestamp too low by: %s; maxClosed: %s ts: %s uncertaintyLimit: %s",
tsDiff, maxClosed, ba.Timestamp, uncertaintyLimitStr)
return false
}

I'm not familiar with this code, so I'm not sure how this relates to getGlobalReadsLead. But I do see that the getCurrentClosedTimestampLocked function checks the sideTransportClosedTimestamp. Is that related to the kv.closed_timestamp.side_transport_interval setting, which also is modified by this test?

Hoping the KV team can keep helping to track down why we would see the can't serve follower read; closed timestamp too low by: 1.07062859s message.

@exalate-issue-sync exalate-issue-sync bot added the P-2 Issues/test failures with a fix SLA of 3 months label May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-24.1 Used to mark GA and release blockers and technical advisories for 24.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
KV
Incoming
SQL Foundations
  
Triage
Development

No branches or pull requests

3 participants