You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The scan range behaviour in minio is not consistent with the official S3 behaviour. This was implemented in PR #14546, and was flagged during the review here, and was acknowledged but unchanged for an undisclosed reason - as it stands the implementation reads up until the end byte specified in the request.
An Amazon S3 Select scan range request runs across the byte range that you specify. A record that starts within the scan range specified but extends beyond the scan range will be processed by the query.
Current Behavior
Minio returns the range of bytes specified in the scan range.
Instead, if starting the read part way through a record, it should read until the delimiter, discard the partial record, then begin reading the remaining bytes, returning any records found in the remaining bytes. If the final byte is a delimiter, it should return at this point, otherwise it should continue the scan until the next delimiter is reached and return after that.
Possible Solution
The changes would need to be made to internal/s3select/select.go
Steps to Reproduce (for bugs)
Make any select-object-content request with a scan range where the end byte is in the middle of a record, and observe the partial response.
Context
More or less a requirement for dealing with large CSV files as chunks. Incorrect implementation makes the select feature materially less useful.
Regression
No
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 15 days if no further activity occurs. Thank you for your contributions.
The scan range behaviour in minio is not consistent with the official S3 behaviour. This was implemented in PR #14546, and was flagged during the review here, and was acknowledged but unchanged for an undisclosed reason - as it stands the implementation reads up until the end byte specified in the request.
AWS documentation of the feature available here.
Expected Behavior
From AWS docs:
An Amazon S3 Select scan range request runs across the byte range that you specify. A record that starts within the scan range specified but extends beyond the scan range will be processed by the query.
Current Behavior
Minio returns the range of bytes specified in the scan range.
Instead, if starting the read part way through a record, it should read until the delimiter, discard the partial record, then begin reading the remaining bytes, returning any records found in the remaining bytes. If the final byte is a delimiter, it should return at this point, otherwise it should continue the scan until the next delimiter is reached and return after that.
Possible Solution
The changes would need to be made to
internal/s3select/select.go
Steps to Reproduce (for bugs)
Make any
select-object-content
request with a scan range where the end byte is in the middle of a record, and observe the partial response.Context
More or less a requirement for dealing with large CSV files as chunks. Incorrect implementation makes the select feature materially less useful.
Regression
No
The text was updated successfully, but these errors were encountered: