-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MongoDB: Support DBRef pushdown #22027
base: master
Are you sure you want to change the base?
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
@@ -1693,14 +1693,14 @@ private static Document getDocumentWithDifferentDbRefFieldOrder(Object objectId) | |||
{ | |||
return new Document() | |||
.append("_id", new ObjectId("5126bbf64aed4daf9e2ab771")) | |||
.append("creator", new Document().append("collectionName", "doc_creators").append("id", objectId).append("databaseName", "doc_test")); | |||
.append("creator", new Document().append("$ref", "doc_creators").append("$id", objectId).append("$db", "doc_test")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're constructing the BSON document by hand here I changed these to actually correspond to correct MongoDB names for collectionName, databaseName etc. Otherwise these documents are in no way related to DBRefs.
} | ||
|
||
private static Document documentWithSameDbRefFieldOrder(Object objectId) | ||
{ | ||
return new Document() | ||
.append("_id", new ObjectId("5126bbf64aed4daf9e2ab771")) | ||
.append("creator", new Document().append("databaseName", "doc_test").append("collectionName", "doc_creators").append("id", objectId)); | ||
.append("creator", new Document().append("$db", "doc_test").append("$ref", "doc_creators").append("$id", objectId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
.append("databaseName", "doc_test") | ||
.append("collectionName", "doc_creators") | ||
.append("id", objectId)); | ||
.append("$db", "doc_test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
.append("databaseName", "test") | ||
.append("collectionName", "creators") | ||
.append("id", objectId)); | ||
.append("$db", "test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
plugin/trino-mongodb/src/test/java/io/trino/plugin/mongodb/TestMongoConnectorTest.java
Outdated
Show resolved
Hide resolved
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
1 similar comment
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
case DATABASE_NAME -> DATABASE_NAME_NATIVE; | ||
case COLLECTION_NAME -> COLLECTION_NAME_NATIVE; | ||
case ID -> ID_NATIVE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I remember why we disallowed DBRef. We can't distinguish DBRef from object with same field names.
This PR causes silent correctness issue if the document has an object with databaseName
, collectionName
and id
fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ebyhr I don't think that's the case - I added a test to check this scenario.
The only time this would be a problem is if the same collection has documents that hold both actual DBRefs and DBRef-like documents in the same field.
Could we just document this possible but extremely unlikely edge case? IMO the importance of DBRef pushdown is way higher than this concern since without it you're loading the entire collection into memory when filtering or joining by a DBRef field (which makes Trino fairly useless for bigger databases in such cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added test is wrong. Please reorder creator
fields as databaseName, collectionName, id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right - I put them in the order that mongo expects them (docs), not the order trino expects them. If I order them as you said the test does indeed fail.
Is there a way we could signal that a field is a DBRef in its type signature so that we didn't have to rely on field names & order to detect a DBRef down the line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ebyhr in order to consider DBRef pushdown do you think we should
a) find a way to support it cleanly (how? a custom type?)
b) require it to be explicitly turned on with a disclaimer that it can break on specifically formed documents?
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Description
This PR adds support for pushing down DBRef fields. DBRefs in MongoDB are special documents that reference another document. They are often used as "foreign keys" so not pushing them down carries a significant performance penalty (you basically load the whole table into Trino every time).
What this PR adds is translation of DBRef columns into their corresponding Mongo names:
databaseName
->$db
collectionName
->$ref
id
->$id
So a filter like
WHERE creators.id = 'abcd'
becomes something like{'creators.$id': 'abcd'}
in Mongo.Resolves #21739
Additional context and related issues
This is my first PR in Trino so I might be missing some conventions or context - happy to adjust if necessary.
I did have to change certain tests and I didn't have full context with some so leaving inline questions and hoping to discuss on this PR if what I've done is correct or not.
Release notes
(X) Release notes are required, with the following suggested text: