Fix: Allow inline regex options in conditional expression branches#124699
Fix: Allow inline regex options in conditional expression branches#124699danmoseley wants to merge 2 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions |
There was a problem hiding this comment.
Pull request overview
This PR fixes a regression where inline regex options (e.g., (?i)) were incorrectly blocked inside the yes/no branches of expression conditionals like (?(test)(?i)yes|no). The root cause was an overly broad check that blocked options for the entire conditional instead of just the test expression. The fix distinguishes between parsing phases by checking if the test expression has been added (ChildCount() > 0), allowing options in branches but not in the test expression itself.
Changes:
- Fixed RegexParser to allow inline options in conditional branches by checking
ChildCount() > 0 - Added tests for inline options in various conditional patterns
- Added SetSearchValues optimization for interpreter character class matching with proper Multi literal encoding validation
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs | Added ChildCount() > 0 check to allow inline options in conditional branches but not in test expressions |
| src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs | Added 5 test cases for inline options in conditional branches; added regression test for Multi literal encoding validation |
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreterCode.cs | Added SetSearchValues optimization with validation to prevent IndexOutOfRangeException on Multi literals resembling character class encodings |
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs | Updated Setrep and Setloop opcodes to use SetSearchValues for vectorized matching in left-to-right mode |
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs
Outdated
Show resolved
Hide resolved
...es/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreterCode.cs
Outdated
Show resolved
Hide resolved
e80a3d1 to
66a5e46
Compare
Fixes dotnet#111633. The regex parser incorrectly rejected inline options like (?i) inside the yes/no branches of expression conditionals (e.g. (?(test)(?i)yes|no)), throwing InvalidGroupingConstruct. The check in ScanGroupOpen blocked ScanOptions() whenever _group was an ExpressionConditional, but this was too broad - it blocked options in the branches, not just in the test expression. The fix narrows the check to only block options when no test expression has been added yet (ChildCount() == 0), allowing options in the branches (ChildCount > 0). Note: BackreferenceConditional was never blocked, confirming this was an unintentional inconsistency introduced in corefx PR dotnet#16609. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
66a5e46 to
7eb0fa8
Compare
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs
Show resolved
Hide resolved
Verify that (?(?i)yes|no) is still rejected as InvalidGroupingConstruct, ensuring the fix only allows options in yes/no branches, not as the test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
ready for review |
Fixes #111633.
Problem
The regex parser rejects inline options like
(?i)inside the yes/no branches of expression conditionals (e.g.(?(test)(?i)yes|no)), throwingUnrecognized grouping construct. This is a regression from .NET Framework, introduced in corefx PR #16609.BackreferenceConditional(e.g.(?(1)(?i)yes|no)) was never blocked, confirming the restriction onExpressionConditionalwas unintentional.Root Cause
In
RegexParser.ScanGroupOpen(),ScanOptions()was skipped whenever_group.Kind == ExpressionConditional. This was intended to protect the test expression, but by the time we're parsing the yes/no branches,_groupis still theExpressionConditionalnode (the test expression is parsed inside its own sub-group). So the check blocked options in the branches rather than in the test.Fix
Changed the condition to also check
_group.ChildCount() > 0: options are only blocked when the conditional has no test expression yet (ChildCount == 0), allowing them in the branches after the test expression has been added (ChildCount > 0).Testing