Skip to content

Minor cleanups to #2056#2069

Draft
cpakman wants to merge 98 commits intomainfrom
liquid-perf-polish
Draft

Minor cleanups to #2056#2069
cpakman wants to merge 98 commits intomainfrom
liquid-perf-polish

Conversation

@cpakman
Copy link
Copy Markdown

@cpakman cpakman commented Apr 5, 2026

I stumbled on #2056 and noticed a few places readability could be improved, as well as one bug. I wanted to give a go at it for fun to see what I could come up with. I set a guardrail of keeping perf within a small margin of the original PR while trying to simplify the content to be easier to reason about and maintain. This rebases the original PR onto latest main and adds 5 additional commits to do some cleanup

  • Fixes an infinite loop in tokenize_fast where a template ending with a stray { would spin forever — the stray-brace scan loop could exit without advancing pos. Regression tests included.

  • Adds a ByteTables module (four 256-entry boolean lookup arrays) to replace scattered inline byte-range comparisons throughout the codebase, and consolidates all byte constants behind Cursor::.

  • Breaks the try_fast_parse monolith in variable.rb into smaller methods.

  • Various dead code removal, idiom cleanup, and comments on the less obvious fast-parse logic.

  • Performance vs autoresearch baseline: combined performance +3%, allocations +1.3%.

tobi added 30 commits April 4, 2026 17:33
tobi added 19 commits April 4, 2026 17:42
…"combined_µs":4121,"parse_µs":2812,"render_µs":1309,"allocations":25535}
…nResult: {"status":"keep","combined_µs":4184,"parse_µs":2921,"render_µs":1263,"allocations":25535}
…ds respond_to? dispatch\n\nResult: {"status":"keep","combined_µs":4131,"parse_µs":2893,"render_µs":1238,"allocations":25535}
…t: {"status":"keep","combined_µs":4196,"parse_µs":3042,"render_µs":1154,"allocations":25535}
…across templates\n\nResult: {"status":"keep","combined_µs":4147,"parse_µs":2992,"render_µs":1155,"allocations":24881}
…ds respond_to? dispatch\n\nResult: {"status":"keep","combined_µs":4103,"parse_µs":2881,"render_µs":1222,"allocations":24881}
…,"combined_µs":3818,"parse_µs":2722,"render_µs":1096,"allocations":24881}
…rse, no regex overhead for delimiter finding\n\nResult: {"status":"keep","combined_µs":3556,"parse_µs":2388,"render_µs":1168,"allocations":24882}
…esult: {"status":"keep","combined_µs":3464,"parse_µs":2335,"render_µs":1129,"allocations":24882}
…ants\n\nResult: {"status":"keep","combined_µs":3490,"parse_µs":2331,"render_µs":1159,"allocations":24882}
…n) overhead, -12% combined\n\nResult: {"status":"keep","combined_µs":3350,"parse_µs":2212,"render_µs":1138,"allocations":24882}
…"status":"keep","combined_µs":3314,"parse_µs":2203,"render_µs":1111,"allocations":24882}
…elation) — saves 235 allocs\n\nResult: {"status":"keep","combined_µs":3445,"parse_µs":2284,"render_µs":1161,"allocations":24647}
…ll, cleaner code\n\nResult: {"status":"keep","combined_µs":3489,"parse_µs":2353,"render_µs":1136,"allocations":24647}
…condition evaluation\n\nResult: {"status":"keep","combined_µs":3459,"parse_µs":2318,"render_µs":1141,"allocations":24647}
… allocation per render\n\nResult: {"status":"keep","combined_µs":3496,"parse_µs":2356,"render_µs":1140,"allocations":24530}
@cpakman cpakman changed the base branch from main to autoresearch/liquid-perf-2026-03-11 April 5, 2026 03:47
@cpakman cpakman changed the base branch from autoresearch/liquid-perf-2026-03-11 to main April 5, 2026 03:47
@cpakman cpakman force-pushed the liquid-perf-polish branch from a8ba498 to 58cb90a Compare April 5, 2026 04:02
The stray-{ else branch in tokenize_fast had a nested while loop that could
exit via its condition (when next_open >= len) without advancing pos.
The outer while pos < len loop would then find the same { again forever.

Reproduced by: Liquid::Template.parse('a{') -- hangs indefinitely.

Replaces the nested scan loop with two String#byteindex calls to find the
next '{%' and '{{' directly, then takes the minimum. Always O(n), eliminates
the nested loop entirely, and impossible to leave pos stranded.

Adds regression tests covering the three inputs that previously hung
plus adjacent stray-brace cases.
@cpakman cpakman force-pushed the liquid-perf-polish branch from 58cb90a to 17e7d8a Compare April 5, 2026 04:26
cpakman added 4 commits April 4, 2026 22:09
Adds ByteTables (lib/liquid/byte_tables.rb): four frozen 256-entry boolean
lookup arrays replacing inline byte-range comparisons throughout:
  IDENT_START, IDENT_CONT, DIGIT, WHITESPACE

Moves require 'liquid/cursor' to immediately after byte_tables in liquid.rb.
Cursor has zero Liquid dependencies; loading it early lets every subsequent
file reference Cursor:: constants directly.

Removes all local byte-constant definitions that duplicated Cursor::
  tokenizer.rb   OPEN_CURLEY / CLOSE_CURLEY / PERCENTAGE
  block_body.rb  OPEN_CURLEY_BYTE / PERCENT_BYTE / DASH_BYTE / CLOSE_CURLEY_BYTE
  expression.rb  DOT / DASH / ZERO / NINE / INTEGER_REGEX / FLOAT_REGEX

Replaces inline byte-range comparisons with ByteTables lookups in:
  cursor.rb, variable_lookup.rb, expression.rb, standardfilters.rb

Removes incidental dead code alongside the constant consolidation:
  tokenizer.rb: require 'strscan', unused string_scanner: param,
                @ss = nil, .to_s.to_str, 'tokenize if @source' guard
  block_body.rb: require 'English'
Decomposes the 211-line try_fast_parse monolith in variable.rb into four
named private methods -- fast_scan_name, fast_resolve_name, fast_scan_filters,
fast_scan_filter_args -- and simplifies simple_variable_markup from a 55-line
byte scanner to a 10-line regex with fast pre-checks.

Generates invoke_single/invoke_two via module_eval in strainer_template.rb
and context.rb instead of duplicating near-identical method bodies. The two
arity variants differ only in parameter lists; generating them makes the
pattern explicit and eliminates copy-paste drift.

Extracts identical 4-line text-token handling block in block_body.rb into
private append_text_token(token, parse_context). The block appeared in
both the stray-{ fallback and the plain-text branch of parse_for_document.

Extracts accessible?(object, key) predicate from the 4-line inline ternary
in variable_lookup.rb that checked hash/array key presence; extracts
liquidize(object, context) from the duplicated to_liquid + context= wiring
that appeared in both the key-found and command-method branches.

Extracts find_in_envs(envs, key, raise_on_not_found:) from
try_variable_find_in_environments in context.rb, which looped over
@Environments then @static_environments with identical loop bodies.

Adds 28 fast-path equivalence tests for Variable (variable_fast_parse_test.rb).
Dead code removed:
  cursor.rb:     parse_for_markup + attr_reader :for_var/collection/reversed
                 (zero callers in lib/; method also incomplete -- no limit/offset)
  for.rb:        Syntax regex, REVERSED_BYTES constant (both unreferenced by
                 lax_parse and strict_parse)
  block_body.rb: BLANK_STRING_REGEX (exact duplicate of WhitespaceOrNothing)

Idioms:
  expression.rb:    parse_number returns nil on failure, not false
                    (Ruby convention for 'no result'; callers use if (num = parse_number))
  block_body.rb:    freeze (idempotent by spec); whitespace_handler marked private;
                    render_node gets rationale comment; redundant comments collapsed
  variable_lookup.rb: COMMAND_METHODS -> %w[]; initialize loop uses each_with_index;
                    removes redundant &. on second clause of &&
  for.rb:           strict2_parse -> alias_method; nil-guard if/else -> ternary;
                    render_else -> ternary; include? checks unconsumed rest only
  condition.rb:     loop/break chain -> while condition.child_relation
  utils.rb:         tightens slice_collection_using_each loop

Comments:
  variable.rb:         backward pipe-walk algorithm; SPACE-only whitespace asymmetry
  cursor.rb:           COMPARISON_OPS identity-map exists for frozen string interning
  if.rb:               include? pre-check is both correctness guard and perf gate
  for.rb:              cursor->regex fallback for limit:/offset: attributes
  strainer_template.rb: __LINE__+1 limitation in module_eval loop
Splits the render loop in block_body.rb on the loop-invariant check_write
condition. The common case (no resource limits) now pays zero branch cost
per node.

Rewrites truncatewords in standardfilters.rb to scan word positions into a
flat int array and builds the result string only when truncation is confirmed.
No string allocation in the common no-truncation case beyond the array itself.

Simplifies rest_blank? in cursor.rb: replaces manual save/skip/restore of
StringScanner position with !@ss.exist?(/\S/). exist? does not advance
position; returns nil when no non-whitespace remains; handles EOS correctly.

Removes the nl newline counter from skip_ws in cursor.rb -- all callers
discarded the return value. NL now handled in the same when-branch as the
other whitespace bytes.
@cpakman cpakman force-pushed the liquid-perf-polish branch from 17e7d8a to 731f64d Compare April 5, 2026 05:14
@isaacbowen
Copy link
Copy Markdown

🥰 thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants