Skip to content

@ccusage/codex: Parser overcounts duplicate token_count rows with unchanged total_token_usage #884

@ayagmar

Description

@ayagmar

Hi! I was testing @ccusage/codex against my local ~/.codex/sessions and noticed that it overcounts token usage due to duplicate snapshots emitted by the Codex CLI.

In apps/codex/src/data-loader.ts, the current logic uses last_token_usage first and only falls back to total_token_usage deltas if last_token_usage is missing:

const lastUsage = normalizeRawUsage(info?.last_token_usage);
const totalUsage = normalizeRawUsage(info?.total_token_usage);

let raw = lastUsage;
if (raw == null && totalUsage != null) {
	raw = subtractRawUsage(totalUsage, previousTotals);
}

The problem is that Codex token_count rows usually contain both fields, and the CLI often emits duplicate snapshots where total_token_usage hasn't changed. By trusting last_token_usage unconditionally, the parser counts the same usage twice.

Minimal repro

{"timestamp":"2026-01-01T00:00:00Z","type":"turn_context","payload":{"model":"gpt-5.2"}}
{"timestamp":"2026-01-01T00:00:01Z","type":"event_msg","payload":{"type":"token_count","info":{"total_token_usage":{"input_tokens":100,"cached_input_tokens":20,"output_tokens":30,"reasoning_output_tokens":5,"total_tokens":130},"last_token_usage":{"input_tokens":100,"cached_input_tokens":20,"output_tokens":30,"reasoning_output_tokens":5,"total_tokens":130}}}}
{"timestamp":"2026-01-01T00:00:02Z","type":"event_msg","payload":{"type":"token_count","info":{"total_token_usage":{"input_tokens":100,"cached_input_tokens":20,"output_tokens":30,"reasoning_output_tokens":5,"total_tokens":130},"last_token_usage":{"input_tokens":100,"cached_input_tokens":20,"output_tokens":30,"reasoning_output_tokens":5,"total_tokens":130}}}}

Expected: second row adds 0 because cumulative totals did not advance.
Actual: both rows are counted, doubling the usage to 260.

Real data impact

From my ~/.codex/sessions:

  • 803 files analyzed (105,966 token_count rows)
  • All rows had both last_token_usage and total_token_usage
  • 41,624 rows repeated the previous cumulative total
  • For sessions with monotonic cumulative totals, summed deltas should equal the final cumulative total. The current logic matched the final total in only 131 / 732 sessions, whereas a parser strictly following total_token_usage deltas matched 100%.

Suggested fix

When total_token_usage is present:

  1. compute delta from the previous cumulative total
  2. emit nothing if totals did not advance
  3. use last_token_usage only when totals are missing or appear to reset/roll back

If this sounds right, I’m happy to put together a PR for data-loader.ts to fix this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions