Skip to content

[CALCITE-5132] Scalar IN subquery returns UNKNOWN instead of FALSE when key is partially NULL#4814

Open
xiedeyantu wants to merge 2 commits intoapache:mainfrom
xiedeyantu:CALCITE-5132
Open

[CALCITE-5132] Scalar IN subquery returns UNKNOWN instead of FALSE when key is partially NULL#4814
xiedeyantu wants to merge 2 commits intoapache:mainfrom
xiedeyantu:CALCITE-5132

Conversation

@xiedeyantu
Copy link
Member

@xiedeyantu xiedeyantu marked this pull request as draft March 1, 2026 15:06
@xiedeyantu
Copy link
Member Author

This PR is not yet ready for review. I need to continue checking.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 1, 2026

@xiedeyantu xiedeyantu marked this pull request as ready for review March 2, 2026 14:51
@xiedeyantu
Copy link
Member Author

I think it is ready for review.

// Now the left join
builder.join(JoinRelType.LEFT, builder.and(conditions), variablesSet);

// for multi-column IN with nullable RHS columns, the global ck < c check is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here needs some context: the actual comparison ck < c is much lower.
Perhaps you can first explain how the expansion is handled for scalar values.

// (empno, deptno) IN (select empno, deptno from ...)
//
// where one RHS row is (7521, null):
// empno=7521: AND(TRUE, UNKNOWN) = UNKNOWN → result should be null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't understand what this AND is

nrAlias = "nr" + subQueryIndex;
}
// Only needed for multi-column, non-literal IN with null-safety logic.
// Single-column cases are handled correctly by the existing ck < c check.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the single column has a ROW type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the row type still falls under the category of single columns. This scenario involves the form (x, y) IN ((...), (...)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a test to confirm that the implementation is correct.
Frankly I don't see a difference between two columns and a ROW type.
If the ROW type works correctly, maybe the two implementations can be unified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I need to confirm this issue again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the form of row in, and it is indeed very similar to the current form. Before removing this patch, the row in form also failed to produce the correct result, but after this patch, it executes correctly.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 3, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants