Skip to content

[refactor] Enhanced Semantic Analysis - Deep Refactoring Opportunities #805

@github-actions

Description

@github-actions

🔧 Enhanced Semantic Function Clustering Analysis

Analysis performed on repository: github/gh-aw-mcpg
Analysis date: 2026-02-07
Total Go files analyzed: 68 files (including root-level files)
Total functions cataloged: ~225+ non-test functions


Executive Summary

This analysis builds upon previous refactoring work and identifies NEW opportunities for code organization improvements. The codebase maintains excellent overall structure, but several patterns emerge that suggest targeted refactoring could improve maintainability and reduce cognitive load.

Key New Findings:

  • Strong functional organization - Most files maintain single responsibility
  • ⚠️ 2 very large files (1000+ lines) - unified.go and connection.go could benefit from extraction
  • ⚠️ Scattered helper files - 8 different "_helpers.go" and "_util*" files across packages
  • ⚠️ Naming inconsistencies - Mix of New* vs Create* patterns for constructors
  • Excellent domain separation - DIFC, guard, and logger packages well-isolated

Function Distribution Summary:

Package Functions Primary Patterns Quality Rating
auth 5 Extract*, Parse*, Validate* ✅ Excellent
cmd 16 Register*, getDefault* ✅ Excellent
config 28+ Load*, Validate*, expand* ✅ Good
difc 35+ New*, Get*, Add*, Check* ✅ Excellent
guard 14 Register*, Get*, Create* ✅ Excellent
launcher 18 GetOrLaunch*, log* ✅ Good
logger 45+ Log*, Init*, Sanitize* ⚠️ Good (scattered)
mcp 24 New*, Send*, try* ⚠️ Good (large file)
middleware 5 apply*, Wrap* ✅ Excellent
server 35+ Handle*, Register*, Get* ⚠️ Good (large file)

High-Priority Opportunities

1. ⚠️ Large File Refactoring Candidates

Priority: 🟡 MEDIUM
Impact: Medium-High - Improved maintainability and testing

Problem: Two Files Exceed 1000 Lines

Files:

  • internal/server/unified.go - 1,025 lines (29 functions, 178 comments)
  • internal/mcp/connection.go - 999 lines (29 functions)

Analysis:

Both files implement cohesive feature sets but have grown to the point where:

  • Navigation becomes challenging
  • Testing requires more setup
  • Pull request reviews are harder
  • Multiple developers may conflict on changes

Recommendation: Extract Logical Components

For internal/server/unified.go:

Consider extracting into multiple files within internal/server/:

  1. unified_registration.go - Tool registration logic

    • Functions: registerAllTools, registerAllToolsSequential, registerAllToolsParallel, registerToolsFromBackend, registerSysTools
    • ~250 lines extracted
  2. unified_session.go - Session management

    • Functions: getSessionID, requireSession, getSessionKeys, ensureSessionDirectory
    • ~150 lines extracted
  3. unified_tools.go - Tool execution logic

    • Functions: callBackendTool, GetToolHandler, GetToolsForBackend
    • ~200 lines extracted
  4. unified.go - Core server struct, constructor, Run, Close

    • Functions: NewUnified, Run, Close, IsShutdown, GetServerIDs, GetServerStatus, GetPayloadSizeThreshold
    • ~400 lines remaining

Benefits:

  • ✅ Each file has a clear, single purpose
  • ✅ Easier to locate registration, session, or tool execution logic
  • ✅ Better test organization (can test components in isolation)
  • ✅ Reduced merge conflicts
  • ✅ Follows Go idiom of grouping related methods by feature in separate files

Estimated Effort: 4-6 hours
Risk: Low-Medium - Pure refactor, no logic changes, tests verify correctness


For internal/mcp/connection.go:

Consider extracting into multiple files within internal/mcp/:

  1. connection_http.go - HTTP-specific connection logic

    • Functions: NewHTTPConnection, initializeHTTPSession, sendHTTPRequest, setupHTTPRequest, HTTP transport methods
    • ~350 lines extracted
  2. connection_transport.go - Transport selection and fallback logic

    • Functions: trySDKTransport, tryStreamableHTTPTransport, trySSETransport, tryPlainJSONTransport
    • ~250 lines extracted
  3. connection_tools.go - Tool/resource/prompt operations

    • Functions: listTools, callTool, listResources, readResource, listPrompts, getPrompt
    • ~200 lines extracted
  4. connection.go - Core Connection struct and SendRequest

    • Functions: NewConnection, SendRequest, SendRequestWithServerID, Close, IsHTTP, core getters
    • ~200 lines remaining

Benefits:

  • ✅ HTTP logic isolated from stdio logic
  • ✅ Transport fallback strategy clearly visible in one file
  • ✅ Tool operations grouped together
  • ✅ Easier to add new transport types or tool operations

Estimated Effort: 4-6 hours
Risk: Low-Medium - Pure refactor, comprehensive test coverage exists


2. ⚠️ Helper File Proliferation

Priority: 🟡 MEDIUM
Impact: Medium - Improved discoverability and consistency

Problem: 8 Different Helper/Util Files Scattered Across Packages

Current Distribution:

internal/timeutil/format.go            - Time formatting utilities
internal/server/http_helpers.go        - HTTP request/session helpers (7 functions)
internal/logger/global_helpers.go      - Global logger helpers
internal/logger/rpc_helpers.go         - RPC message extraction/sanitization
internal/launcher/log_helpers.go       - Launcher-specific logging (6 methods)
internal/testutil/mcptest/             - Test utilities (4 files)

Analysis:

Most helper files serve legitimate domain-specific purposes (✅), but there are opportunities for consolidation:

Recommendation A: Consolidate RPC Helpers

Current:

  • internal/logger/rpc_helpers.go - RPC message utilities
  • internal/logger/rpc_formatter.go - RPC message formatting
  • internal/logger/rpc_logger.go - RPC logging operations

Proposed:

  • Merge into internal/logger/rpc.go or create subdirectory internal/logger/rpc/
    • rpc/helpers.go - Extraction utilities
    • rpc/formatter.go - Formatting
    • rpc/logger.go - Logging operations

Benefits:

  • ✅ Clearer namespace: logger.rpc.* instead of scattered across 3 files
  • ✅ Easier to locate all RPC-related logging
  • ✅ Package-level documentation in one place

Estimated Effort: 2-3 hours
Risk: Low - Move operations, update imports


Recommendation B: Evaluate launcher/log_helpers.go

Current Location: internal/launcher/log_helpers.go

  • 6 private logging methods: logSecurityWarning, logLaunchStart, logEnvPassthrough, etc.
  • Only used within launcher.go (unexported)

Analysis:

  • Current placement is acceptable - Domain-specific logging for launcher
  • ⚠️ Alternative: Could move to internal/logger/ as LogLaunchEvent(event, details...) pattern

Recommendation: Keep as-is - These are tightly coupled to Launcher operations. Moving to logger package would create artificial abstraction.


3. ⚠️ Constructor Naming Inconsistency

Priority: 🟢 LOW
Impact: Low - Developer experience and consistency

Problem: Mixed Constructor Patterns

Three patterns used across codebase:

  1. New* (most common) - 35+ occurrences

    • Examples: NewConnection, NewUnified, NewRegistry, NewLabel
    • Idiomatic Go pattern
  2. Create* - 8 occurrences

    • Examples: CreateHTTPServerForMCP, CreateGuard
    • ⚠️ Often indicates factory pattern vs simple constructor
  3. Init* - 4 occurrences (loggers only)

    • Examples: InitFileLogger, InitJSONLLogger
    • ⚠️ Indicates global state initialization vs object creation

Analysis:

Most usage is consistent, but there are edge cases:

  • CreateHTTPServerForMCP - Could be NewHTTPServer
  • CreateGuard - Correctly named (factory pattern with registry lookup)

Recommendation:

Document the convention in CONTRIBUTING.md:

## Constructor Naming Conventions

Use these patterns consistently:

1. **`New*(args) *Type`** - Standard constructor for new instances
   ```go
   func NewConnection(ctx context.Context) *Connection
   ```

2. **`Create*(args) (Type, error)`** - Factory pattern with registry/config lookup
   ```go
   func CreateGuard(name string) (Guard, error) // looks up registered type
   ```

3. **`Init*(args) error`** - Global state initialization (singletons, loggers)
   ```go
   func InitFileLogger(dir string) error // initializes global logger
   ```

**When to use each:**
- Most constructors → `New*`
- Registry-based factories → `Create*`
- Global initialization → `Init*`

Estimated Effort: 1 hour (documentation only)
Risk: None - No code changes


4. ⚠️ Validation Function Distribution

Priority: 🟢 LOW
Impact: Low-Medium - Code discoverability

Problem: 10+ validate* Functions Scattered

Current Distribution:

internal/config/validation.go:
  - validateServerConfig
  - validateGatewayConfig
  - validateMounts
  - validateContainerID
  
internal/config/validation_schema.go:
  - validateJSONSchema
  - validateCustomSchemas
  - validateServerConfigWithCustomSchemas
  - validateCustomServerConfig
  - validateStandardServerConfig
  - validateStringPatterns

internal/auth/header.go:
  - ValidateAPIKey

internal/server/http_helpers.go:
  - extractAndValidateSession

Analysis:

Most validation functions are correctly placed by domain:

  • ✅ Config validation in config/ package
  • ✅ Schema validation in config/validation_schema.go
  • ✅ Auth validation in auth/ package
  • ✅ Session validation in server/ package

Recommendation: No action needed - Current organization follows domain-driven design. Validation functions belong close to the data they validate.

Alternative (future): If validation grows significantly, consider internal/config/validation/ subdirectory with:

  • validation/config.go - Config structure validation
  • validation/schema.go - JSON schema validation
  • validation/env.go - Environment validation

Trigger for refactor: When validation.go exceeds 500 lines OR 5+ validation files exist


Known Issues (Previously Reported)

These issues were identified in previous analysis (#728) and are tracked separately:

✅ Issue #728-1: TruncateSessionID Misplacement

  • Status: Previously identified
  • Function: auth.TruncateSessionID should move to logger/sanitize/
  • Priority: 🔴 HIGH

✅ Issue #728-2: runDockerInspect Outlier

  • Status: Previously identified, documented as acceptable
  • Function: runDockerInspect in config/validation_env.go
  • Decision: Keep as-is (tightly coupled to validation checks)
  • Priority: 🟢 LOW (documentation only)

Detailed Function Clusters

Cluster 1: Constructor Patterns (35+ functions)

Pattern: New* - Object creation without error handling
Distribution: All packages
Examples:

  • NewConnection, NewHTTPConnection (mcp)
  • NewUnified, NewSession (server)
  • NewRegistry, NewNoopGuard (guard)
  • NewLabel, NewAgentLabels (difc)
  • NewSessionConnectionPool (launcher)

Analysis:Excellent consistency - Most packages follow Go idiom of New* constructors


Cluster 2: Registration Patterns (15+ functions)

Pattern: Register* - Callback or plugin registration
Distribution: cmd, config, guard, server
Examples:

  • RegisterFlag, registerAllFlags, registerFlagCompletions (cmd)
  • RegisterDefaults, RegisterStdinConverter (config)
  • RegisterGuardType, Register (guard)
  • registerAllTools, registerToolsFromBackend, registerGuard (server)

Analysis:Consistent pattern - Registration for extensibility


Cluster 3: Validation Patterns (15+ functions)

Pattern: Validate*, validate*, check*, ensure*
Distribution: config, auth, server
Examples:

  • ValidateExecutionEnvironment, ValidateContainerizedEnvironment (config)
  • validateServerConfig, validateGatewayConfig (config)
  • ValidateAPIKey (auth)
  • extractAndValidateSession (server)

Analysis:Well-distributed by domain - Each validator in appropriate package


Cluster 4: Logging Patterns (20+ functions)

Pattern: Log*, log*, Init*Logger
Distribution: logger package + domain-specific helpers
Examples:

  • LogInfo, LogWarn, LogError, LogDebug (logger)
  • LogRPCMessage* variants (logger)
  • logSecurityWarning, logLaunchStart (launcher)
  • logRuntimeError, logHTTPRequestBody (server)

Analysis: ⚠️ Acceptable with caveat - Core logging centralized, domain helpers correctly placed. RPC helpers could consolidate.


Cluster 5: HTTP/Transport Patterns (12+ functions)

Pattern: Handle*, handle*, try*Transport
Distribution: server, mcp
Examples:

  • HandleHealth, handleUnifiedMCP, handleRoutedMCP (server)
  • trySDKTransport, tryStreamableHTTPTransport, trySSETransport (mcp)
  • sendHTTPRequest, setupHTTPRequest (mcp)

Analysis:Well-organized - HTTP logic isolated from stdio logic


Cluster 6: Accessor Patterns (40+ functions)

Pattern: Get*, Set*, Extract*
Distribution: All packages
Examples:

  • GetServerIDs, GetServerStatus, GetToolsForBackend (server)
  • GetSecrecyTags, GetIntegrityTags, GetAllAgentIDs (difc)
  • Get, Set, Delete, List (launcher pool)
  • ExtractAgentID, ExtractSessionID (auth)

Analysis:Idiomatic Go - Consistent getter/setter naming


Refactoring Implementation Plan

Phase 1: High-Impact Refactoring (2-3 weeks)

Priority 1: Extract unified.go components

  • Create unified_registration.go - Tool registration (~250 lines)
  • Create unified_session.go - Session management (~150 lines)
  • Create unified_tools.go - Tool execution (~200 lines)
  • Refactor unified.go to core functionality (~400 lines)
  • Update tests to reflect new file structure
  • Run full test suite: make test-all
  • Estimated: 6-8 hours

Priority 2: Extract connection.go components

  • Create connection_http.go - HTTP logic (~350 lines)
  • Create connection_transport.go - Transport fallback (~250 lines)
  • Create connection_tools.go - Tool operations (~200 lines)
  • Refactor connection.go to core functionality (~200 lines)
  • Update tests
  • Run full test suite: make test-all
  • Estimated: 6-8 hours

Priority 3: Consolidate RPC logger helpers

  • Create internal/logger/rpc/ subdirectory or merge to rpc.go
  • Move rpc_helpers.go, rpc_formatter.go, rpc_logger.go content
  • Update imports across codebase
  • Run tests: make test
  • Estimated: 2-3 hours

Phase 2: Documentation & Standards (1 week)

Priority 4: Document constructor conventions

  • Add section to CONTRIBUTING.md on constructor naming
  • Document New* vs Create* vs Init* patterns
  • Provide examples from codebase
  • Estimated: 1 hour

Phase 3: Future Enhancements (As Needed)

Priority 5: Monitor validation growth

  • Track when validation.go exceeds 500 lines
  • Consider internal/config/validation/ subdirectory if needed
  • Trigger: File size >500 lines OR 5+ validation files

Architecture Strengths (Unchanged from Previous Analysis)

The codebase demonstrates several architectural strengths that should be maintained:

1. Excellent Package Cohesion

  • Clear package boundaries with single responsibilities
  • No circular dependencies detected
  • Domain-driven design with proper separation

2. Minimal Code Duplication

  • Similar-named functions serve different domains appropriately
  • Generic helpers exist where needed (logger/common.go:initLogger[T])
  • Type-specific wrappers kept minimal

3. Idiomatic Go Patterns

  • Consistent constructor naming (mostly New*)
  • Proper interface usage (io.Closer, Guard, BackendCaller)
  • Domain-driven design with clear boundaries

4. Security-First Design

  • Dedicated logger/sanitize/ package for secret handling
  • DIFC implementation isolated in dedicated package
  • Consistent truncation for safe logging

5. Good Use of Generics

  • logger/common.go:initLogger[T] eliminates logger initialization duplication
  • Type constraints properly used (closableLogger constraint)

Analysis Metadata

  • Analysis Date: 2026-02-07
  • Total Go Files Analyzed: 68 (47 in internal/, 3 in root, 18 in testutil)
  • Total Non-Test Functions: ~225+
  • Semantic Clusters Identified: 6 major patterns
  • High-Priority Issues: 2 (large file extraction)
  • Medium-Priority Issues: 2 (helper consolidation, constructor docs)
  • Low-Priority Issues: 2 (validation monitoring)
  • Detection Method: Semantic analysis + grep pattern matching + manual code review
  • Previous Analysis Reference: #728

Conclusion

The gh-aw-mcpg codebase continues to demonstrate excellent code organization. The new opportunities identified focus on:

  1. Breaking up large files for improved maintainability
  2. Consolidating related helpers for better discoverability
  3. Documenting conventions for consistency

All recommendations are optional enhancements that would improve developer experience but do not indicate architectural problems.

Recommended Action: Prioritize Phase 1 refactoring (unified.go and connection.go extraction) during the next maintenance cycle. This provides the highest ROI for improved maintainability.

Overall Code Quality Rating: ⭐⭐⭐⭐⭐ (5/5) - Exceptionally well-organized codebase with clear opportunities for incremental improvement


References:

AI generated by Semantic Function Refactoring

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions