Skip to content

feat: add src/regex.ts core module + aggregate() regex filter #111

@shouze

Description

@shouze

Context

Part of #110 (EPIC: Support regex syntax /pattern/ in search queries) — step 1 of 2.

The GitHub REST API does not support regex queries. This issue implements the pure-function core that makes regex work:

  • detect regex syntax
  • derive a meaningful API query algorithmically
  • filter API results locally with the original regex

Scope

New module: src/regex.ts

Pure functions only — no I/O.

/** Returns true if the query contains a /pattern/ or /pattern/flags token. */
export function isRegexQuery(q: string): boolean

/**
 * Given a raw query string (possibly mixing GitHub qualifiers and a /regex/ token),
 * returns:
 * - apiQuery: the query safe to send to the GitHub REST API
 * - regexFilter: the compiled RegExp to apply locally on text_matches.fragment
 * - warn: set when no exploitable literal term could be extracted (user should use --regex-hint)
 */
export function buildApiQuery(q: string): {
  apiQuery: string;
  regexFilter: RegExp | null;
  warn?: string;
}

buildApiQuery() algorithm:

  1. Tokenize the raw query: separate qualifier:value tokens (e.g. filename:, language:, path:) from the /pattern/flags token
  2. Preserve all qualifier tokens unchanged
  3. On the regex pattern:
    • Top-level alternation A|B|C (not nested inside [...] or (...)) → A OR B OR C
    • Otherwise → extract all unescaped literal sequences (ignoring .*, \s, \d, [...], quantifiers +?*{n}, anchors ^$) → pick the longest one
    • If the result is fewer than 3 characters → set warn, return empty term
  4. Reconstruct: "<qualifiers> <apiTerm>".trim()
  5. Compile the original pattern into a RegExp (with extracted flags, excluding g)

Extend src/aggregate.ts

Add an optional regexFilter?: RegExp parameter to aggregate(). When provided, a CodeMatch is kept only if at least one of its text_matches fragments matches the regex.

Update src/types.ts if needed

If the new regexFilter parameter requires a shared type, add it to src/types.ts first (type-first approach per AGENTS.md).

Acceptance criteria

src/regex.test.ts — all cases must pass

Input query apiQuery regexFilter warn?
/from.*['"]axios/ axios /from.*['"]axios/
/TODO|FIXME|HACK/ TODO OR FIXME OR HACK /TODO|FIXME|HACK/
/require\(['"]old-lib['"]\)/ old-lib /require\(['"]old-lib['"]\)/
filename:package.json /["']axios["']:\s*"/ filename:package.json axios
/[~^]?[0-9]+\.[0-9]+/ "" ⚠️ set
/useState/ useState /useState/
/(import|require).*someLongLib/ someLongLib (alternation partial → fallback to longest)
/pattern/i pattern /pattern/i
/pattern/gi pattern /pattern/i (g stripped)
/[/ (invalid regex) "" null ⚠️ set
plain text query plain text query (no-op) null

src/aggregate.test.ts — new cases

  • aggregate() with regexFilter=/axios/ keeps only matches where at least one fragment matches
  • aggregate() with regexFilter set but no fragment matches → repo excluded from results
  • aggregate() without regexFilter (undefined) → existing behaviour unchanged

Checklist

  • src/regex.ts created with isRegexQuery and buildApiQuery
  • src/regex.test.ts created, all table cases above pass
  • src/aggregate.ts extended with regexFilter?: RegExp
  • src/aggregate.test.ts extended with regex filter cases
  • src/types.ts updated if new shared types are introduced
  • bun test — full suite green
  • bun run lint — zero errors
  • bun run format:check — no diff
  • bun run knip — no unused exports

Note: C4 diagrams and AGENTS.md are updated in #112 once the full module surface is known.

Branch

feat/regex-core

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions