Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 94 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,40 @@ The tool analyzes commit messages, file modifications, and code patterns to prov
```bash
# Basic repository scan with HTML output
commitraider --repo /path/to/repository --output html

# Compact JSON for AI agent consumption
commitraider --repo /path/to/repository --output agent-json --top-n 50

# Analyze with code complexity metrics
commitraider --repo /path/to/repository --output json --stats

# Show only security fixes with CVE references
commitraider --repo /path/to/repository --output json --cve-only

# Export JSON Schema for validation
commitraider --output-schema > agent-report-schema.json

# Ultra-compact mode for tools with character limits (<30k chars)
commitraider --repo /path/to/repository --output agent-json --compact
```

### AI/Agent Integration

For AI assistants that need to analyze repository security:

```bash
# Recommended: use agent-json with bounded output
commitraider --repo /path/to/repo --output agent-json --top-n 20 --stats

# Compact mode
commitraider --repo /path/to/repo --output agent-json --compact

# The agent-json output includes:
# - Overall risk score (0-10)
# - Critical/high finding counts
# - CVE references
# - Risky files (single author, stale, complex)
# - Vulnerable dependencies
```

## Installation
Expand All @@ -45,10 +79,15 @@ Usage: commitraider [OPTIONS] --repo <REPO>

Options:
-r, --repo <REPO> Path to Git repository to analyze
-o, --output <OUTPUT> Output format (html, json) [default: html]
-c, --cve-only Show only CVE references
-o, --output <OUTPUT> Output format (html, json, agent-json) [default: html]
-p, --patterns <PATTERNS> Pattern set to use (vuln, memory, crypto, all) [default: vuln]
--output-file <OUTPUT_FILE> Output file name. If not specified, agent-json outputs to stdout
-c, --cve-only Show only CVE references
-s, --stats Include detailed statistics and code complexity analysis
--stale-days <STALE_DAYS> Minimum days since last commit to flag as stale [default: 365]
--top-n <TOP_N> Maximum findings/risk files in agent-json output [default: 50]
--output-schema Output the JSON schema for agent-json format and exit
--compact Use ultra-compact agent-json output (<30k chars). Only applies to --output agent-json
-v, --verbose Enable verbose logging
-t, --threads <THREADS> Number of threads for Rayon parallel vulnerability scanning (0 = auto-detect CPU cores) [default: 0]
-h, --help Print help
Expand All @@ -65,7 +104,59 @@ Interactive web-based reports featuring:
- File type distribution and risk categorization

### Structured Data Formats
- **JSON**: Machine-readable output for CI/CD pipeline integration

#### JSON (`--output json`)
Machine-readable output for CI/CD pipeline integration. Note: this format can be very large for big repositories as it includes the complete commit history and file metadata.

#### Agent-JSON (`--output agent-json`) *[Recommended for AI/Agent consumption]*
A compact, bounded JSON format optimized for AI agents and automated tools:

- **10x smaller** than regular JSON on large repositories
- **Bounded output**: `--top-n` limits findings and risk files (default 50)
- **Pre-calculated summaries**: Risk scores, severity levels, and CVE counts
- **Schema validation**: Use `--output-schema` to get the JSON Schema

Example:
```bash
# Basic agent-friendly output (50 items max)
commitraider --repo /path/to/repo --output agent-json

# Limited output for very large repos
commitraider --repo /path/to/repo --output agent-json --top-n 20

# With code complexity metrics
commitraider --repo /path/to/repo --output agent-json --stats

# Get the JSON Schema
commitraider --output-schema > schema.json
```

**Ultra-Compact Mode (`--compact`):**
For tools with strict character limits (e.g., 30,000 char tool output limits):

```bash
# Ultra-compact output (~5-20k chars, <30k guaranteed)
commitraider --repo /path/to/repo --output agent-json --compact
```

The `--compact` flag produces a condensed report with:
- Shortened field names (e.g., `v`, `repo`, `risk`, `vulns`, `files`)
- Top 15 vulnerability findings including:
- Short commit ID (8 chars)
- Truncated commit message (120 chars max)
- Risk score and severity
- Pattern names, severities, and categories
- CVE references
- Changed files (basenames only)
- Top 10 risk files
- Summary counts by severity

**Agent-JSON Structure:**
- `repository`: Repository metadata (path, commits, files, authors)
- `summary`: High-level risk overview (scores, CVE counts, risk file counts)
- `findings`: Top-N vulnerability findings sorted by risk score
- `risk_files`: High-risk files (complexity, ownership, staleness issues)
- `dependencies`: Outdated/vulnerable dependency information

## Detection Capabilities

Expand Down
80 changes: 58 additions & 22 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,27 +14,28 @@ mod patterns;
use analysis::CodeAnalyzer;
use config::Config;
use git::GitAnalyzer;
use output::agent::AgentReport;
use output::Reporter;
use patterns::PatternEngine;

#[derive(Parser)]
#[command(author, version, about, long_about = None)]
struct Cli {
/// Repository path to analyze
#[arg(short, long)]
repo: PathBuf,
#[arg(short, long, required_unless_present("output_schema"))]
repo: Option<PathBuf>,

/// Pattern set to use (vuln, memory, crypto, all)
#[arg(short, long, default_value = "vuln")]
patterns: String,

/// Output format (html, json)
/// Output format (html, json, agent-json)
#[arg(short, long, default_value = "html")]
output: String,

/// Output file (report.html|json)
#[arg(long, default_value = "report_commit_raider")]
output_file: String,
/// Output file (report.html|json). If not specified, agent-json outputs to stdout
#[arg(long)]
output_file: Option<String>,

/// Show only CVE references
#[arg(short, long)]
Expand All @@ -55,13 +56,38 @@ struct Cli {
/// Number of threads for Rayon parallel vulnerability scanning (0 = auto-detect CPU cores)
#[arg(short, long, default_value = "0")]
threads: usize,

/// Maximum number of findings and risk files to include in agent-json output
#[arg(long, default_value = "50")]
top_n: usize,

/// Output the JSON schema for agent-json format and exit
#[arg(long)]
output_schema: bool,

/// Use ultra-compact agent-json output. Only applies to --output agent-json
#[arg(long)]
compact: bool,
}

#[tokio::main]
async fn main() -> Result<()> {
fn main() -> Result<()> {
let runtime = tokio::runtime::Runtime::new()?;
runtime.block_on(async_main())
}

async fn async_main() -> Result<()> {
let cli = Cli::parse();

// Initialize logging
// Handle schema output - this conflicts with repo-based operations
if cli.output_schema {
AgentReport::print_schema();
return Ok(());
}

// Extract repo path early - clap ensures it's Some via required_unless_present
let repo = cli.repo.expect("--repo is required when not using --output-schema");

// Initialize logging to stderr so stdout stays clean for data output
let level = if cli.verbose {
Level::DEBUG
} else {
Expand All @@ -70,6 +96,7 @@ async fn main() -> Result<()> {
tracing_subscriber::fmt()
.with_max_level(level)
.with_target(false)
.with_writer(std::io::stderr)
.init();

if cli.threads > 0 {
Expand All @@ -78,23 +105,29 @@ async fn main() -> Result<()> {
.build_global()?;
}

println!(
"{}",
"CommitRaider - Git History Security Scanner"
.bright_cyan()
.bold()
);
// Skip banner for agent-json when outputting to stdout (for clean piping)
let skip_banner = matches!(cli.output.as_str(), "agent-json" | "agent")
&& cli.output_file.is_none();

if !skip_banner {
println!(
"{}",
"CommitRaider - Git History Security Scanner"
.bright_cyan()
.bold()
);
println!(
"Repository: {}",
cli.repo.display().to_string().bright_white()
repo.display().to_string().bright_white()
);
}

let config = Config::load()?;
let pattern_engine = PatternEngine::new(&cli.patterns)?;

let git_analyzer = GitAnalyzer::new(&cli.repo)?;
let git_analyzer = GitAnalyzer::new(&repo)?;
let code_analyzer = CodeAnalyzer::new();
let mut reporter = Reporter::new(&cli.output, &cli.output_file)?;
let mut reporter = Reporter::new(&cli.output, cli.output_file.as_deref())?;

info!("Starting repository analysis...");

Expand All @@ -103,7 +136,7 @@ async fn main() -> Result<()> {

let code_stats = if cli.stats {
info!("Stats requested, starting code analysis...");
code_analyzer.analyze(&cli.repo, cli.stale_days).await?
code_analyzer.analyze(&repo, cli.stale_days).await?
} else {
info!("Stats not requested, using default code stats");
// Create minimal code stats when not requested
Expand All @@ -113,7 +146,7 @@ async fn main() -> Result<()> {

info!("Starting vulnerability pattern scanning...");
let vulnerabilities = pattern_engine
.scan_repository(&cli.repo, &git_stats)
.scan_repository(&repo, &git_stats)
.await?;
info!(
"Pattern scanning complete, found {} vulnerabilities",
Expand All @@ -128,10 +161,13 @@ async fn main() -> Result<()> {
};

reporter
.generate_report(&findings, cli.cve_only, cli.stats)
.generate_report(&findings, cli.cve_only, cli.stats, cli.top_n, cli.compact)
.await?;

println!("\n{}", "Analysis complete!".bright_green().bold());
// Skip completion message for agent-json when outputting to stdout
if !skip_banner {
println!("\n{}", "Analysis complete!".bright_green().bold());
}

Ok(())
}
Loading