Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB: Support DBRef pushdown #22027

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

nsaje
Copy link

@nsaje nsaje commented May 20, 2024

Description

This PR adds support for pushing down DBRef fields. DBRefs in MongoDB are special documents that reference another document. They are often used as "foreign keys" so not pushing them down carries a significant performance penalty (you basically load the whole table into Trino every time).

What this PR adds is translation of DBRef columns into their corresponding Mongo names:

  • databaseName -> $db
  • collectionName -> $ref
  • id -> $id

So a filter like WHERE creators.id = 'abcd' becomes something like {'creators.$id': 'abcd'} in Mongo.

Resolves #21739

Additional context and related issues

This is my first PR in Trino so I might be missing some conventions or context - happy to adjust if necessary.

I did have to change certain tests and I didn't have full context with some so leaving inline questions and hoping to discuss on this PR if what I've done is correct or not.

Release notes

(X) Release notes are required, with the following suggested text:

# MongoDB
* Improve read and filter performance when selecting or filtering on DBRef fields. ({issue}`issuenumber`)

Copy link

cla-bot bot commented May 20, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@github-actions github-actions bot added the mongodb MongoDB connector label May 20, 2024
@@ -1693,14 +1693,14 @@ private static Document getDocumentWithDifferentDbRefFieldOrder(Object objectId)
{
return new Document()
.append("_id", new ObjectId("5126bbf64aed4daf9e2ab771"))
.append("creator", new Document().append("collectionName", "doc_creators").append("id", objectId).append("databaseName", "doc_test"));
.append("creator", new Document().append("$ref", "doc_creators").append("$id", objectId).append("$db", "doc_test"));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're constructing the BSON document by hand here I changed these to actually correspond to correct MongoDB names for collectionName, databaseName etc. Otherwise these documents are in no way related to DBRefs.

}

private static Document documentWithSameDbRefFieldOrder(Object objectId)
{
return new Document()
.append("_id", new ObjectId("5126bbf64aed4daf9e2ab771"))
.append("creator", new Document().append("databaseName", "doc_test").append("collectionName", "doc_creators").append("id", objectId));
.append("creator", new Document().append("$db", "doc_test").append("$ref", "doc_creators").append("$id", objectId));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

.append("databaseName", "doc_test")
.append("collectionName", "doc_creators")
.append("id", objectId));
.append("$db", "doc_test")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

.append("databaseName", "test")
.append("collectionName", "creators")
.append("id", objectId));
.append("$db", "test")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link

cla-bot bot commented May 20, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

1 similar comment
Copy link

cla-bot bot commented May 20, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented May 20, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Comment on lines +51 to +53
case DATABASE_NAME -> DATABASE_NAME_NATIVE;
case COLLECTION_NAME -> COLLECTION_NAME_NATIVE;
case ID -> ID_NATIVE;
Copy link
Member

@ebyhr ebyhr May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I remember why we disallowed DBRef. We can't distinguish DBRef from object with same field names.
This PR causes silent correctness issue if the document has an object with databaseName, collectionName and id fields.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr I don't think that's the case - I added a test to check this scenario.

The only time this would be a problem is if the same collection has documents that hold both actual DBRefs and DBRef-like documents in the same field.

Could we just document this possible but extremely unlikely edge case? IMO the importance of DBRef pushdown is way higher than this concern since without it you're loading the entire collection into memory when filtering or joining by a DBRef field (which makes Trino fairly useless for bigger databases in such cases).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added test is wrong. Please reorder creator fields as databaseName, collectionName, id.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - I put them in the order that mongo expects them (docs), not the order trino expects them. If I order them as you said the test does indeed fail.

Is there a way we could signal that a field is a DBRef in its type signature so that we didn't have to rely on field names & order to detect a DBRef down the line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr in order to consider DBRef pushdown do you think we should
a) find a way to support it cleanly (how? a custom type?)
b) require it to be explicitly turned on with a disclaimer that it can break on specifically formed documents?

Copy link

cla-bot bot commented May 22, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mongodb MongoDB connector
Development

Successfully merging this pull request may close these issues.

MongoDB DBRef pushdown
2 participants