Skip to content

Introduce the LSH compiler#753

Open
lhecker wants to merge 6 commits intomainfrom
dev/lhecker/syntax-highlighting-compiler
Open

Introduce the LSH compiler#753
lhecker wants to merge 6 commits intomainfrom
dev/lhecker/syntax-highlighting-compiler

Conversation

@lhecker
Copy link
Member

@lhecker lhecker commented Jan 27, 2026

This PR contains no CLI frontend, etc., for the compiler,
as I split out everything but the compiler to reduce the PR size.

Part of #624

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14/19 plus high water mark of 171 on generator.rs; still ongoing

JumpIfMatchPrefixInsensitive { idx: u32, tgt: u32 },

// Flushes the current HighlightKind to the output.
FlushHighlight { kind: Register },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(why is this its own instruction instead of a register? i guess having a special register would mean you need to check writes to every register to see if it was the special output register. however, you already have that with pc...)

//! | `[a-z]?` | `Charset{cs, min=0, max=1}` - optional char |
//! | `$` | `EndOfLine` condition |
//! | `.*` | `MovImm off, MAX` - skip to end of line |
//! | `\>` | `If Charset(\w) then FAIL else MATCH` - word boundary |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like a vim-ism; everyone else uses \b for word boundary... right?

Comment on lines +30 to +31
//! | `[a-z]+` | `Charset{cs, min=1, max=∞}` - greedy char class |
//! | `[a-z]?` | `Charset{cs, min=0, max=1}` - optional char |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually these are Repeat{Charset{cs}, min=..., max=...}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nm, not in the IR they're not! only in the regex layer. ignore

}
}

/// a|b|c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"or a", because this is also the parser for the no-alternation case

Ok(Regex::Group { inner: Box::new(inner), capturing: false })
}
Some('i') => {
// Case-insensitive (?i:...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean we can't have a capturing ?i?


let mut charset = Charset::no();

// First char can be ] or - literally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀 - is not handled here

optimize_noop(compiler);
}

/// This isn't an optimization for the VM, it's one for my autistic side.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. can't say this, i'm sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants