[Refactor] Add new AST node types and resolve AST TODOs#99
[Refactor] Add new AST node types and resolve AST TODOs#99colinthebomb1 wants to merge 6 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the internal AST representation to better model several SQL constructs (types/keywords, value lists, intervals, CASE expressions, and DISTINCT variants) and updates the expected AST fixtures accordingly.
Changes:
- Introduces new AST node types:
TypeNode,ListNode,IntervalNode, andCaseNode, plus newNodeTypeenum values. - Extends
SelectNodeto representDISTINCTandDISTINCT ON. - Updates
data/asts.pyexpected ASTs (and tweaks formatter tests) to use the new node types.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
core/ast/node.py |
Adds new node classes and extends SelectNode to model DISTINCT/DISTINCT ON. |
core/ast/enums.py |
Adds NodeType enum values for the new node types. |
data/asts.py |
Updates expected AST fixtures to use TypeNode, ListNode, IntervalNode, CaseNode, and DISTINCT metadata. |
tests/test_query_formatter.py |
Adjusts/relaxes formatter tests by commenting out some formatter calls and minor assertion key change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def __init__(self, _items: List['Node'], **kwargs): | ||
| """SELECT clause node. _distinct_on is the list of expressions for DISTINCT ON (e.g. ListNode of columns).""" | ||
| def __init__(self, _items: List['Node'], _distinct: bool = False, _distinct_on: Optional['Node'] = None, **kwargs): | ||
| super().__init__(NodeType.SELECT, children=_items, **kwargs) |
There was a problem hiding this comment.
SelectNode tracks _distinct_on but does not include it in children. Any generic AST traversal that relies on children will miss DISTINCT ON expressions, which can lead to incorrect rewrites/formatting/analysis. Consider representing DISTINCT ON as part of the node’s subtree (e.g., include it in children with a dedicated wrapper node/field-aware traversal) and update consumers accordingly.
| super().__init__(NodeType.SELECT, children=_items, **kwargs) | |
| # Include DISTINCT ON expressions in children so generic AST traversals see them. | |
| children: List[Node] = list(_items) if _items is not None else [] | |
| if _distinct_on is not None: | |
| children.append(_distinct_on) | |
| super().__init__(NodeType.SELECT, children=children, **kwargs) |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Overview:
This PR adds new node types and adds functionality to existing ones to improve the constructed ASTs for test files, resolving most of the TODOs identified in PR #97.
Code Changes:
TypeNodefor SQL type keywords (TEXT,DATE,NULL, etc.) — replacesFunctionNode("TEXT", _args=[LiteralNode("{}")])ListNodefor value lists (e.g. the RHS ofINexpressions) — replaces raw Python listsIntervalNodeforINTERVALexpressions — replacesFunctionNode("INTERVAL", ...)CaseNodeforCASE WHEN ... THEN ... ELSE ... END— replaces nestedFunctionNode("CASE"/"WHEN"/"THEN"/"ELSE")_distinctand_distinct_onparameters toSelectNodeforSELECT DISTINCT/DISTINCT ONNodeType.TYPE,LIST,INTERVAL,CASEto enumsdata/asts.pyto use the new node types