Skip to content

feat: add src/regex.ts core module + aggregate() regex filter #111

@shouze

Description

@shouze

Context

Part of #110 (EPIC: Support regex syntax /pattern/ in search queries) — step 1 of 2.

The GitHub REST API does not support regex queries. This issue implements the pure-function core that makes regex work:

  • detect regex syntax
  • derive a meaningful API query algorithmically
  • filter API results locally with the original regex

Scope

New module: src/regex.ts

Pure functions only — no I/O.

/** Returns true if the query contains a /pattern/ or /pattern/flags token. */
export function isRegexQuery(q: string): boolean

/**
 * Given a raw query string (possibly mixing GitHub qualifiers and a /regex/ token),
 * returns:
 * - apiQuery: the query safe to send to the GitHub REST API
 * - regexFilter: the compiled RegExp to apply locally on text_matches.fragment
 * - warn: set when no exploitable literal term could be extracted (user should use --regex-hint)
 */
export function buildApiQuery(q: string): {
  apiQuery: string;
  regexFilter: RegExp | null;
  warn?: string;
}

buildApiQuery() algorithm:

  1. Tokenize the raw query: separate qualifier:value tokens (e.g. filename:, language:, path:) from the /pattern/flags token
  2. Preserve all qualifier tokens unchanged
  3. On the regex pattern:
    • Top-level alternation A|B|C (not nested inside [...] or (...)) → A OR B OR C
    • Otherwise → extract all unescaped literal sequences (ignoring .*, \s, \d, [...], quantifiers +?*{n}, anchors ^$) → pick the longest one
    • If the result is fewer than 3 characters → set warn, return empty term
  4. Reconstruct: "<qualifiers> <apiTerm>".trim()
  5. Compile the original pattern into a RegExp (with extracted flags, excluding g)

Extend src/aggregate.ts

Add an optional regexFilter?: RegExp parameter to aggregate(). When provided, a CodeMatch is kept only if at least one of its text_matches fragments matches the regex.

Update src/types.ts if needed

If the new regexFilter parameter requires a shared type, add it to src/types.ts first (type-first approach per AGENTS.md).

Acceptance criteria

src/regex.test.ts — all cases must pass

Input query apiQuery regexFilter warn?
/from.*['"]axios/ axios /from.*['"]axios/
/TODO|FIXME|HACK/ TODO OR FIXME OR HACK /TODO|FIXME|HACK/
/require\(['"]old-lib['"]\)/ old-lib /require\(['"]old-lib['"]\)/
filename:package.json /["']axios["']:\s*"/ filename:package.json axios
/[~^]?[0-9]+\.[0-9]+/ "" ⚠️ set
/useState/ useState /useState/
/(import|require).*someLongLib/ someLongLib (alternation partial → fallback to longest)
/pattern/i pattern /pattern/i
/pattern/gi pattern /pattern/i (g stripped)
/[/ (invalid regex) "" null ⚠️ set
plain text query plain text query (no-op) null

src/aggregate.test.ts — new cases

  • aggregate() with regexFilter=/axios/ keeps only matches where at least one fragment matches
  • aggregate() with regexFilter set but no fragment matches → repo excluded from results
  • aggregate() without regexFilter (undefined) → existing behaviour unchanged

Checklist

  • src/regex.ts created with isRegexQuery and buildApiQuery
  • src/regex.test.ts created, all table cases above pass
  • src/aggregate.ts extended with regexFilter?: RegExp
  • src/aggregate.test.ts extended with regex filter cases
  • src/types.ts updated if new shared types are introduced
  • bun test — full suite green
  • bun run lint — zero errors
  • bun run format:check — no diff
  • bun run knip — no unused exports

Note: C4 diagrams and AGENTS.md are updated in #112 once the full module surface is known.

Branch

feat/regex-core

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions