The Batch Compendium features a fully automated system for discovering, integrating, and maintaining the collection of Windows batch script repositories. This system requires no manual intervention and runs automatically on a bi-weekly schedule.
- Searches GitHub for high-quality batch script repositories
- Filters by star count, activity, and quality metrics
- Removes duplicates automatically
- Applies multi-factor quality scoring
- Creates organized directory structures
- Generates comprehensive READMEs for each repository
- Categorizes repositories automatically
- Updates collection databases and statistics
- Refreshes existing repositories from their original sources
- Tracks update timestamps with metadata
- Ensures collection stays current with latest versions
- Prevents stale or outdated scripts
- Multi-factor quality scoring system
- Checks for suspicious or malicious content
- Validates repository authenticity
- Ensures minimum quality standards
- Updates README statistics automatically
- Generates repository index files
- Creates comprehensive documentation for each repository
- Maintains categorized listings
┌─────────────────────────────────────────────────────────────┐
│ Bi-Weekly Automated Run │
│ (Every 2 weeks on Monday) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Step 1: Update Existing Repositories from Upstream │
│ • Fetch latest versions from original GitHub repos │
│ • Update metadata and timestamps │
│ • Refresh batch scripts with newest versions │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Step 2: Discover New Repositories │
│ • Search GitHub for batch repos with 50+ stars │
│ • Query multiple search patterns for comprehensive coverage │
│ • Filter by language, activity, and popularity │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Step 3: Pre-Filter & Duplicate Detection │
│ • Check against existing repository database │
│ • Scan for existing directory structures │
│ • Remove duplicates by name, URL, and content │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Step 4: Quality Filtering & Scoring │
│ • Apply multi-factor quality assessment │
│ • Check for suspicious patterns or malware indicators │
│ • Validate repository authenticity and maintenance │
│ • Score based on stars, activity, and documentation │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Step 5: Integration & Organization │
│ • Create category-based directory structures │
│ • Generate comprehensive README for each repository │
│ • Update collection database (repo_results.csv) │
│ • Add metadata tracking files │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Step 6: Documentation Updates │
│ • Update main README with new script counts │
│ • Regenerate REPO_INDEX.md with latest statistics │
│ • Update HIGHLY_RATED_REPOS.md listing │
│ • Refresh category documentation │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Step 7: Create Pull Request │
│ • Commit all changes to new branch │
│ • Generate comprehensive PR description │
│ • Include statistics and validation results │
│ • Tag with automation labels │
└─────────────────────────────────────────────────────────────┘
Purpose: Main orchestration script that runs the complete automation process
Key Functions:
- Coordinates all automation steps
- Manages workflow execution
- Handles error recovery
- Generates comprehensive reports
Usage:
python3 automate_discovery.py \
--min-stars 50 \
--max-results 100 \
--update-existing \
--github-token $GITHUB_TOKENPurpose: Discovers batch script repositories on GitHub
Features:
- Multiple search query strategies
- Star count filtering
- Duplicate removal
- Rate limit handling
- CSV output generation
Usage:
python3 identify_batch_repos.py \
--token $GITHUB_TOKEN \
--min-stars 50 \
--max-results 100Purpose: Filters new repositories against existing collection
Features:
- Duplicate detection (by name, URL, directory)
- Quality keyword analysis
- Repository validation
- Comprehensive filtering reports
Usage:
python3 process_new_discoveries.py \
--new-repos discovered.csv \
--existing-repos repo_results.csv \
--output filtered.csvPurpose: Advanced quality assessment and filtering
Features:
- Multi-factor quality scoring
- Suspicious pattern detection
- Malware indicator checks
- Repository authenticity validation
- Activity and maintenance checks
Scoring Factors:
- Star count (popularity)
- Last update date (maintenance)
- Description quality
- Documentation presence
- Suspicious pattern absence
- Repository age
- Fork ratio
Purpose: Integrates new repositories into the collection
Features:
- Directory structure creation
- README generation with safety guidelines
- Metadata file creation
- Category organization
- Collection database updates
Usage:
python3 integrate_repositories.py \
--new-repos filtered.csv \
--base-path ../../.. \
--update-collectionPurpose: Updates existing repositories from their upstream sources
Features:
- Fetches latest versions from GitHub
- Metadata tracking with timestamps
- Smart update detection (checks last commit)
- Configurable update limits
- Preservation of metadata files
Usage:
python3 update_upstream_repos.py \
--base-path ../../.. \
--github-token $GITHUB_TOKEN \
--limit 10Update Logic:
- Checks
.upstream_metadata.jsonfor last update time - Updates if older than 30 days
- Checks GitHub API for new commits
- Only updates if new content is available
- Preserves local metadata and organization
Purpose: Updates collection statistics and documentation
Features:
- Script count updates
- Category statistics
- Documentation regeneration
- Index file updates
Purpose: Generates documentation for highly-rated repositories
Features:
- Sorted by star count
- Category grouping
- Statistics and metrics
- Markdown formatting
File: .github/workflows/discover-repositories.yml
Triggers:
- Scheduled: Every 2 weeks (1st and 3rd Monday of each month at 9:00 AM UTC)
- Manual: Via "Run workflow" button with custom parameters
- Push: When automation scripts are updated
- Pull Request: To test changes before merging
- API Dispatch: Via external triggers
Workflow Steps:
- Checkout repository
- Set up Python environment
- Install dependencies (requests library)
- Configure Git for commits
- Create feature branch
- Run upstream updates (limited to 5 repos per run)
- Discover new repositories
- Pre-filter for duplicates
- Apply quality filtering
- Integrate new repositories
- Generate updated documentation
- Validate all changes
- Commit changes
- Create pull request
Environment Variables:
GITHUB_TOKEN: Provided automatically by GitHub ActionsMIN_STARS: Configurable via workflow inputs (default: 50)MAX_RESULTS: Configurable via workflow inputs (default: 100)
cd z.repo_support/scripts
python3 automate_discovery.py \
--min-stars 50 \
--max-results 100 \
--github-token $GITHUB_TOKENcd z.repo_support/scripts
python3 identify_batch_repos.py \
--token $GITHUB_TOKEN \
--min-stars 50 \
--output discovered.csvcd z.repo_support/scripts
python3 update_upstream_repos.py \
--base-path ../../.. \
--github-token $GITHUB_TOKEN \
--limit 10# From repository root
./maintenance find-repos --min-stars 100
./maintenance update-upstream --limit 5
./maintenance update-countpython3 automate_discovery.py --dry-run --min-stars 100python3 update_upstream_repos.py --limit 5 --base-path ../..python3 update_upstream_repos.py \
--repos massgravel/Microsoft-Activation-Scripts AveYo/MediaCreationTool.bat- Suspicious Pattern Detection: Flags repositories with malware indicators
- Malicious Content Screening: Checks for crack/hack/virus keywords
- Repository Validation: Verifies authenticity and legitimacy
- Activity Monitoring: Ensures repositories are actively maintained
- Documentation Verification: Confirms proper documentation exists
Each integrated repository includes:
⚠️ Safety warnings and guidelines- 🔒 Recommendations to review scripts before execution
- 🧪 Suggestions to test in safe environments (VMs)
- 💾 Backup reminders before running system modification scripts
- 🛡️ Antivirus scan recommendations
Repositories are scored on multiple factors:
- 1000+ stars: Full points
- 100-999 stars: Scaled score
- 50-99 stars: Minimum score
- Updated within 6 months: Full points
- 6-12 months: Reduced score
- Older: Lower score
- Excellent keywords: Bonus points
- Good keywords: Standard points
- Warning keywords: Penalty
- Missing description: Reduced score
- Comprehensive README: Full points
- Basic README: Reduced score
- No README: Minimum score
- No suspicious patterns: Full points
- Warning patterns: Reduced score
- Bad patterns: Rejected
- Default: 50 stars (ensures quality)
- High Quality: 100+ stars (popular repositories)
- Top Tier: 1000+ stars (widely trusted)
- Scheduled Runs: Bi-weekly (every 2 weeks)
- Upstream Updates: Limited to 5 repos per automated run
- Manual Triggers: Available anytime via GitHub Actions
- Discovery: 100 repositories per search (configurable)
- Upstream Updates: 5 repositories per automated run (prevents timeouts)
- Manual Runs: Unlimited (configurable via --limit)
- ✅ Repository discovery (bi-weekly)
- ✅ Upstream updates (bi-weekly, limited)
- ✅ Documentation updates (automatic)
- ✅ Script count updates (automatic)
- ✅ Quality filtering (automatic)
- Review and merge automation PRs
- Adjust quality thresholds if needed
- Update category structures
- Refine search queries
- Clean up outdated repositories
- Check GitHub Actions workflow status
- Review automation PR summaries
- Monitor for failures or errors
- Validate quality of integrated repositories
Potential improvements to the automation system:
-
Enhanced Update Intelligence
- Track specific file changes
- Selective file updates
- Conflict resolution
-
Advanced Quality Metrics
- Community feedback integration
- Usage statistics
- Download counts
- Issue resolution rate
-
Improved Categorization
- Machine learning-based categorization
- Multi-category support
- Dynamic category creation
-
Extended Monitoring
- Repository health checks
- Broken link detection
- License compliance verification
- Dependency vulnerability scanning
- README.md - Main repository documentation
- CONTRIBUTING_REPOS.md - Contributing guide
- SETUP_AUTOMATION.md - Setup instructions
- WORKFLOW_TRIGGERS_GUIDE.md - Trigger configuration
- z.repo_support/scripts/README.md - Scripts documentation
- Check if GitHub Actions are enabled
- Verify workflow file syntax
- Check repository permissions
- Verify GitHub token is valid
- Check API rate limits
- Adjust search parameters
- Verify network connectivity
- Check repository accessibility
- Ensure valid GitHub URLs
- Check directory permissions
- Verify CSV file formats
- Ensure category directories exist
- Review workflow logs in GitHub Actions
- Check automation result files
- Examine error messages in PR descriptions
- Consult individual script help:
python3 script.py --help
Note: This automation system is designed to be completely hands-off. Once configured, it requires no manual intervention and will automatically maintain and grow the collection over time.