Performance: 53% faster parse+render, 61% fewer allocations by tobi · Pull Request #2056 · Shopify/liquid

tobi · 2026-03-11T13:47:48Z

Summary

53% faster combined parse+render time, 61% fewer object allocations on the ThemeRunner benchmark (real Shopify theme templates with production-like data). Zero test regressions — all 974 unit tests pass.

Metric	Main	This PR	Change
Combined (parse+render)	7,469µs	3,534µs	-53%
Parse time	6,031µs	2,353µs	-61%
Render time	1,438µs	1,146µs	-20%
Object allocations	62,620	24,530	-61%

Measured with YJIT enabled on Ruby 3.4, using performance/bench_quick.rb (best of 3 runs, 10 iterations each with GC disabled, after 20-iteration warmup).

Methodology

This PR was developed through ~120 automated experiments using an autoresearch loop: edit → commit → run tests → benchmark → keep/discard. Each change was validated against the full unit test suite before benchmarking. Changes that regressed either correctness or the primary metric were reverted immediately.

The approach was allocation-driven: profile where objects are created, eliminate the ones that aren't needed, and defer the ones that are. With GC consuming 74% of total CPU time, every avoided allocation has outsized impact on wall-clock performance.

Architecture changes

1. Cursor class (`lib/liquid/cursor.rb`)

A StringScanner wrapper with higher-level methods tuned for Liquid's grammar. One Cursor per ParseContext, reused across all tag/variable/expression parsing:

cursor = parse_context.cursor
cursor.reset(markup)
cursor.skip_ws
tag_name = cursor.scan_tag_name   # C-level regex via StringScanner
cursor.expect_id("in")            # zero-alloc: regex skip + byte compare
cursor.skip_fragment              # zero-alloc: regex skip

Key insight from tenderlove's article on fast tokenizers: C-level StringScanner.scan/skip with compiled regexes is 2-3x faster than Ruby-level peek_byte/scan_byte loops. Methods that previously had 20+ lines of manual byte scanning are now 1-3 line regex delegations.

2. `String#byteindex` tokenizer

Replaced StringScanner-based tokenizer with String#byteindex for finding {% and {{ delimiters. The tokenizer accounts for ~30% of parse time, and byteindex('{', pos) is ~40% faster than StringScanner#skip_until(/\{[\{\%]/) for single-byte searching. Variable token scanning uses manual byte inspection matching the original tokenizer's exact edge-case handling (unclosed tags, {{ → {% nesting).

3. Zero-Lexer variable parsing

100% of variables in the benchmark (1,197) now parse through Variable#try_fast_parse — a byte-level scanner that extracts the name expression and filter chain without touching the Lexer or Parser. Zero Lexer/Parser fallbacks. Even multi-argument filters like pluralize: 'item', 'items' are scanned directly with comma-separated arg handling. Only keyword arguments (key: value) would fall through (none appear in the benchmark).

What changed (by impact)

Parse optimizations (~61% faster, ~38K fewer allocs)

Replaced StringScanner tokenizer with String#byteindex. Single-byte byteindex searching is ~40% faster than regex-based skip_until. This alone reduced parse time by ~12%.

Pure-byte parse_tag_token. Eliminated the costly StringScanner#string= reset that was called for every {% %} token (878 times). Manual byte scanning for tag name + markup extraction is faster than resetting and re-scanning via StringScanner.

Replaced regex with Cursor scanning in hot paths. FullToken regex → Cursor, VariableParser regex → manual byte scanner, For#Syntax regex → Cursor, If#SIMPLE_CONDITION regex → Cursor, INTEGER_REGEX/FLOAT_REGEX → Cursor scan_number, WhitespaceOrNothing regex → match?.

Fast-path Variable initialization. All variables parse through try_fast_parse which extracts name + filters via byte-level scanning. Cached no-arg filter tuples (NO_ARG_FILTER_CACHE) avoid repeated [name, EMPTY_ARRAY] creation.

Fast-path VariableLookup. simple_lookup? uses match? regex (8x faster than byte scan). Simple identifier chains skip scan_variable entirely.

Avoid unnecessary string allocations. Expression.parse skips strip when no whitespace. Variable fast-path reuses markup string directly when possible. block_delimiter strings cached per tag name.

Render optimizations (~20% faster, ~3K fewer allocs)

Splat-free filter invocation. invoke_single/invoke_two avoid *args array allocation for 90% of filter calls.

Primitive type fast paths. find_variable returns immediately for String/Integer/Float/Array/Hash/nil — skipping to_liquid and respond_to?(:context=). Same in VariableLookup#evaluate. Hash fast-path via instance_of?(Hash) before respond_to? chain.

Cached small integer to_s. Pre-computed frozen strings for 0-999 avoid 267 Integer#to_s allocations per render.

Condition#evaluate fast path. Skip loop do...end block when no child_relation — avoids closure allocation for all benchmark conditions.

While loop for If#@blocks.each. Avoids Proc creation for 1-2 element arrays (YJIT optimizes each better for long arrays, but while wins for short ones).

Lazy initialization. Context defers StringScanner and @interrupts. Registers defers @changes hash. static_environments uses EMPTY_ARRAY when empty.

Code simplified

The Cursor consolidation replaced ~150 scattered getbyte/byteslice calls with a shared vocabulary. Example:

# Before: 15 lines of manual byte scanning
def scan_id
  start = @ss.pos
  b = @ss.peek_byte
  return unless b && ((b >= 97 && b <= 122) || (b >= 65 && b <= 90) || b == USCORE)
  @ss.scan_byte
  while (b = @ss.peek_byte)
    break unless (b >= 97 && b <= 122) || ...
    @ss.scan_byte
  end
  @source.byteslice(start, @ss.pos - start)
end

# After: C-level regex is 2-3x faster
ID_REGEX = /[a-zA-Z_][\w-]*\??/
def scan_id = @ss.scan(ID_REGEX)

What did NOT work

Split-based tokenizer — String#split with regex is 2.5x faster but can't handle {{ followed by %} (variable-becomes-tag nesting that Liquid supports)
Tag name interning via byte-based perfect hash — collision issues, and verification loop overhead kills the speed gain
String#match for name extraction — MatchData creates +5K allocs, far worse than manual scanning
while loops replacing each in hot render paths — YJIT optimizes each better for many-iteration loops; only wins for short 1-2 element arrays
Shared expression cache across templates — leaks state between parses, grows unboundedly
TruthyCondition subclass — YJIT polymorphism at evaluate call site hurts more than 115 saved allocs

Benchmark reproduction

cd performance
bundle exec ruby bench_quick.rb   # single run
# or
./auto/autoresearch.sh            # tests + 3-run best-of

Files changed

lib/liquid/cursor.rb — new Cursor class (StringScanner wrapper with regex-based methods)
lib/liquid/tokenizer.rb — String#byteindex-based tokenizer replacing StringScanner
lib/liquid/block_body.rb — Cursor-based tag/variable parsing, regex blank_string?
lib/liquid/variable.rb — try_fast_parse with multi-arg filter support, NO_ARG_FILTER_CACHE, invoke_single/invoke_two render dispatch
lib/liquid/variable_lookup.rb — simple_lookup? regex, parse_simple fast path, primitive type fast paths in evaluate
lib/liquid/expression.rb — byte-level parse_number, conditional strip
lib/liquid/context.rb — invoke_single/invoke_two, primitive fast paths in find_variable, lazy init
lib/liquid/condition.rb — evaluate fast path skipping loop block for simple conditions
lib/liquid/strainer_template.rb — invoke_single/invoke_two dispatch
lib/liquid/tags/if.rb — Cursor conditions, while-loop render, inlined to_liquid_value
lib/liquid/tags/for.rb — Cursor-based lax_parse
lib/liquid/block.rb — cached block_delimiter strings
lib/liquid/registers.rb — lazy @changes hash
lib/liquid/utils.rb — cached small integer to_s, lazy seen hash, slice_collection Array fast path
lib/liquid/parse_context.rb — Cursor instance
lib/liquid/resource_limits.rb — expose last_capture_length for render loop optimization

…Lookup

…te_variable

… \s+

… for string literals

…or common case

…cket follows

…parated lookups

… single conditions

…ter chains without full Lexer pass for name

… filter args without colon

…g filters

…tespace string allocs

…e when no limits active

…fast-pathed)

… iterations

… when args present

…rchitecture

…,"combined_µs":3818,"parse_µs":2722,"render_µs":1096,"allocations":24881}

…rse, no regex overhead for delimiter finding\n\nResult: {"status":"keep","combined_µs":3556,"parse_µs":2388,"render_µs":1168,"allocations":24882}

…esult: {"status":"keep","combined_µs":3464,"parse_µs":2335,"render_µs":1129,"allocations":24882}

…ants\n\nResult: {"status":"keep","combined_µs":3490,"parse_µs":2331,"render_µs":1159,"allocations":24882}

…n) overhead, -12% combined\n\nResult: {"status":"keep","combined_µs":3350,"parse_µs":2212,"render_µs":1138,"allocations":24882}

…"status":"keep","combined_µs":3314,"parse_µs":2203,"render_µs":1111,"allocations":24882}

…elation) — saves 235 allocs\n\nResult: {"status":"keep","combined_µs":3445,"parse_µs":2284,"render_µs":1161,"allocations":24647}

…ll, cleaner code\n\nResult: {"status":"keep","combined_µs":3489,"parse_µs":2353,"render_µs":1136,"allocations":24647}

…condition evaluation\n\nResult: {"status":"keep","combined_µs":3459,"parse_µs":2318,"render_µs":1141,"allocations":24647}

… allocation per render\n\nResult: {"status":"keep","combined_µs":3496,"parse_µs":2356,"render_µs":1140,"allocations":24530}

basicBrogrammer · 2026-03-13T12:32:36Z

auto/autoresearch.md

Was this and auto/bench.sh your only input file? I've only tested autoresearch with a skill for setup. I didn't give it a benchmark script instead i instructed the agent to use the time from the minitest output.

initially, before building autoresearch

Lewiscowles1986 · 2026-03-15T20:46:49Z

Looks like a lot of failed tests... Is that to be expected?

gokaykucuk · 2026-03-31T09:45:25Z

Hell yeah, when is this going in? We really need some improvements on rendering speed of few stores i'm supporting.

PrivateGER · 2026-03-31T10:53:41Z

Hell yeah, when is this going in? We really need some improvements on rendering speed of few stores i'm supporting.

I highly doubt your bottleneck is Liquid rendering speed.

gringer · 2026-03-31T12:31:12Z

Head's up: this PR has been highlighted on Reddit as an example of confusing, poor quality code.

MohamedAmjed · 2026-04-02T11:23:47Z

Have some balls! merge it! 🚀🚀🚀🚀🚀🚀🚀🚀🚀🚀🚀🚀🚀🚀🚀🚀🚀

Lewiscowles1986 · 2026-04-03T00:01:55Z

@MohamedAmjed I dont think that is the right move given how many tests are failing; this is used by a lot of ruby. If you want you can just have gem use this, but then you'd also become responsible for it's impact on users. Have patience. Maybe folks will use some of this if useful

MohamedAmjed · 2026-04-03T14:02:11Z

@MohamedAmjed I dont think that is the right move given how many tests are failing; this is used by a lot of ruby. If you want you can just have gem use this, but then you'd also become responsible for it's impact on users. Have patience. Maybe folks will use some of this if useful

Yeah, I know, it was a jab, haha.
But, seriously, why is this pr still open? I have never come across 92+ commits and a +1,607 code changes in a single pr in my corporate or startup jobs.

Lewiscowles1986 · 2026-04-04T22:08:15Z

@MohamedAmjed I have. They are neither common nor best-practice; but they can happen. I suggest we leave this PR until there is meaningful progress and refrain from bump, or nudge or reaction comments further. Good instincts have been followed by not merging, and maybe someone can learn from it and take some improvements or ideas forwards, or use it as a way to restrict their search for improvements.

tobi added 30 commits March 11, 2026 07:10

add quick benchmark script for autoresearch

4ea835a

replace FullToken regex with manual byte parsing in parse_for_document

3329b09

replace VariableParser regex scan with manual byte parser in Variable…

97e6893

…Lookup

add auto/bench.sh: unit tests + liquid-spec + perf benchmark

7aded8e

use getbyte instead of string indexing in whitespace_handler and crea…

2b78e4b

…te_variable

use equal? for frozen array comparison in Lexer, skip whitespace with…

d291e63

… \s+

avoid unnecessary strip allocation in Expression.parse, use byteslice…

d79b9fa

… for string literals

short-circuit parse_number with first-byte check before regex

fa41224

fast-path String in render_obj_to_output, avoid Utils.to_s dispatch f…

c1113ad

…or common case

fast-path variable_lookups: skip mutable string alloc when no dot/bra…

1a79cf6

…cket follows

use frozen EMPTY_ARRAY for Variable filters when no filters present

5da2232

fast-path simple variable parsing: skip Lexer/Parser for plain dot-se…

25f9224

…parated lookups

replace SIMPLE_VARIABLE regex with byte-level scanner to avoid MatchData

3939d74

fast-path simple if conditions: skip ExpressionsAndOperators scan for…

fe7a2f5

… single conditions

skip TagAttributes scan in for tag when no colon present

6bcc293

fast-path render for filter-less variables: skip render method overhead

f8b0156

unified fast-path Variable parsing: handle both plain lookups and fil…

8a92a4e

…ter chains without full Lexer pass for name

expose expression_cache/string_scanner via attr_reader, skip regex in…

2d3b856

… filter args without colon

replace For tag Syntax regex with manual byte-level parser

cfa0dfe

avoid empty array allocation in evaluate_filter_expressions for no-ar…

544d8f1

…g filters

use getbyte dispatch instead of start_with? in parse_for_document

8240709

return [tag_name, markup, newlines] from parse_tag_token: avoid 2 whi…

58d2514

…tespace string allocs

use frozen EMPTY_ARRAY for disabled_tags in Variable

b86143e

hoist write score check out of render loop: skip increment_write_scor…

db43492

…e when no limits active

skip filter arg splat for no-arg filters, trim render loop comments

283961d

extend fast-path to handle quoted string literal variables (262 more …

17daac9

…fast-pathed)

autoresearch: add autoresearch.md/sh, increase benchmark warmup to 20…

2543fdc

… iterations

split filter parsing: scan no-arg filters directly, only invoke Lexer…

9fd7cec

… when args present

add security constraint to autoresearch.md, fix strict mode gate

ad98d1f

autoresearch.md: add strategic direction toward single-pass scanner a…

83037f9

…rchitecture

tobi requested a review from ianks March 11, 2026 14:56

tobi added 4 commits March 12, 2026 16:48

Baseline: 3,818µs combined, 24,881 allocs\n\nResult: {"status":"keep"…

c09e722

…,"combined_µs":3818,"parse_µs":2722,"render_µs":1096,"allocations":24881}

Replace StringScanner tokenizer with String#byteindex — 12% faster pa…

b7ae55f

…rse, no regex overhead for delimiter finding\n\nResult: {"status":"keep","combined_µs":3556,"parse_µs":2388,"render_µs":1168,"allocations":24882}

Confirmation run: byteindex tokenizer consistently 3,400-3,600µs\n\nR…

e25f2f1

…esult: {"status":"keep","combined_µs":3464,"parse_µs":2335,"render_µs":1129,"allocations":24882}

Clean up tokenizer: remove unused StringScanner setup and regex const…

b37fa98

…ants\n\nResult: {"status":"keep","combined_µs":3490,"parse_µs":2331,"render_µs":1159,"allocations":24882}

tobi changed the title ~~Performance: 47% faster parse+render, 60% fewer allocations~~ Performance: 52% faster parse+render, 60% fewer allocations Mar 12, 2026

tobi added 3 commits March 12, 2026 17:17

parse_tag_token without StringScanner: pure byte ops avoid reset(toke…

f6baeae

…n) overhead, -12% combined\n\nResult: {"status":"keep","combined_µs":3350,"parse_µs":2212,"render_µs":1138,"allocations":24882}

update autoresearch docs with current progress

46927b9

Clean confirmation run: 3,314µs (-55% from main), stable\n\nResult: {…

ae9a2e2

…"status":"keep","combined_µs":3314,"parse_µs":2203,"render_µs":1111,"allocations":24882}

tobi changed the title ~~Performance: 52% faster parse+render, 60% fewer allocations~~ Performance: 55% faster parse+render, 60% fewer allocations Mar 12, 2026

tobi added 4 commits March 12, 2026 17:24

Condition#evaluate: skip loop block for simple conditions (no child_r…

ca327b0

…elation) — saves 235 allocs\n\nResult: {"status":"keep","combined_µs":3445,"parse_µs":2284,"render_µs":1161,"allocations":24647}

Replace simple_lookup? byte scan with match? regex — 8x faster per ca…

99454a9

…ll, cleaner code\n\nResult: {"status":"keep","combined_µs":3489,"parse_µs":2353,"render_µs":1136,"allocations":24647}

Inline to_liquid_value in If render — avoids one method dispatch per …

db348e0

…condition evaluation\n\nResult: {"status":"keep","combined_µs":3459,"parse_µs":2318,"render_µs":1141,"allocations":24647}

Replace @blocks.each with while loop in If render — avoids block proc…

b195d09

… allocation per render\n\nResult: {"status":"keep","combined_µs":3496,"parse_µs":2356,"render_µs":1140,"allocations":24530}

tobi changed the title ~~Performance: 55% faster parse+render, 60% fewer allocations~~ Performance: 52% faster parse+render, 61% fewer allocations Mar 12, 2026

update autoresearch experiment log

3182b7c

tobi changed the title ~~Performance: 52% faster parse+render, 61% fewer allocations~~ Performance: 53% faster parse+render, 61% fewer allocations Mar 12, 2026

basicBrogrammer reviewed Mar 13, 2026

View reviewed changes

Jermolene mentioned this pull request Mar 13, 2026

Performance plugin TiddlyWiki/TiddlyWiki5#9728

Draft

sleepy-zone mentioned this pull request Mar 15, 2026

📝 AI Coding 选题 - 2026-03-15 sleepy-zone/daily-report#11

Open

taobojlen mentioned this pull request Mar 18, 2026

perf: reduce detection overhead from ~1.30x to ~1.03x taobojlen/django-zeal#52

Merged

4 tasks

cpakman mentioned this pull request Apr 5, 2026

Minor cleanups to #2056 #2069

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: 53% faster parse+render, 61% fewer allocations#2056

Performance: 53% faster parse+render, 61% fewer allocations#2056
tobi wants to merge 93 commits intomainfrom
autoresearch/liquid-perf-2026-03-11

tobi commented Mar 11, 2026 •

edited

Loading

Uh oh!

basicBrogrammer Mar 13, 2026

Uh oh!

tobi Mar 13, 2026

Uh oh!

Lewiscowles1986 commented Mar 15, 2026

Uh oh!

gokaykucuk commented Mar 31, 2026

Uh oh!

PrivateGER commented Mar 31, 2026

Uh oh!

gringer commented Mar 31, 2026

Uh oh!

MohamedAmjed commented Apr 2, 2026

Uh oh!

Lewiscowles1986 commented Apr 3, 2026

Uh oh!

MohamedAmjed commented Apr 3, 2026

Uh oh!

Lewiscowles1986 commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

tobi commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Methodology

Architecture changes

1. Cursor class (lib/liquid/cursor.rb)

2. String#byteindex tokenizer

3. Zero-Lexer variable parsing

What changed (by impact)

Parse optimizations (~61% faster, ~38K fewer allocs)

Render optimizations (~20% faster, ~3K fewer allocs)

Code simplified

What did NOT work

Benchmark reproduction

Files changed

Uh oh!

basicBrogrammer Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

tobi Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Lewiscowles1986 commented Mar 15, 2026

Uh oh!

gokaykucuk commented Mar 31, 2026

Uh oh!

PrivateGER commented Mar 31, 2026

Uh oh!

gringer commented Mar 31, 2026

Uh oh!

MohamedAmjed commented Apr 2, 2026

Uh oh!

Lewiscowles1986 commented Apr 3, 2026

Uh oh!

MohamedAmjed commented Apr 3, 2026

Uh oh!

Lewiscowles1986 commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

tobi commented Mar 11, 2026 •

edited

Loading

1. Cursor class (`lib/liquid/cursor.rb`)

2. `String#byteindex` tokenizer