Skip to content

Latest commit

 

History

History
481 lines (395 loc) · 16.6 KB

File metadata and controls

481 lines (395 loc) · 16.6 KB

Automation System Documentation

Overview

The Batch Compendium features a fully automated system for discovering, integrating, and maintaining the collection of Windows batch script repositories. This system requires no manual intervention and runs automatically on a bi-weekly schedule.

🎯 Key Features

1. Automated Repository Discovery

  • Searches GitHub for high-quality batch script repositories
  • Filters by star count, activity, and quality metrics
  • Removes duplicates automatically
  • Applies multi-factor quality scoring

2. Intelligent Integration

  • Creates organized directory structures
  • Generates comprehensive READMEs for each repository
  • Categorizes repositories automatically
  • Updates collection databases and statistics

3. Upstream Repository Updates

  • Refreshes existing repositories from their original sources
  • Tracks update timestamps with metadata
  • Ensures collection stays current with latest versions
  • Prevents stale or outdated scripts

4. Quality Filtering

  • Multi-factor quality scoring system
  • Checks for suspicious or malicious content
  • Validates repository authenticity
  • Ensures minimum quality standards

5. Automated Documentation

  • Updates README statistics automatically
  • Generates repository index files
  • Creates comprehensive documentation for each repository
  • Maintains categorized listings

🔄 Automation Workflow

Complete Automation Process

┌─────────────────────────────────────────────────────────────┐
│                   Bi-Weekly Automated Run                    │
│                  (Every 2 weeks on Monday)                   │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Step 1: Update Existing Repositories from Upstream          │
│ • Fetch latest versions from original GitHub repos          │
│ • Update metadata and timestamps                            │
│ • Refresh batch scripts with newest versions                │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Step 2: Discover New Repositories                           │
│ • Search GitHub for batch repos with 50+ stars              │
│ • Query multiple search patterns for comprehensive coverage │
│ • Filter by language, activity, and popularity              │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Step 3: Pre-Filter & Duplicate Detection                    │
│ • Check against existing repository database                │
│ • Scan for existing directory structures                    │
│ • Remove duplicates by name, URL, and content               │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Step 4: Quality Filtering & Scoring                         │
│ • Apply multi-factor quality assessment                     │
│ • Check for suspicious patterns or malware indicators       │
│ • Validate repository authenticity and maintenance          │
│ • Score based on stars, activity, and documentation         │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Step 5: Integration & Organization                          │
│ • Create category-based directory structures                │
│ • Generate comprehensive README for each repository          │
│ • Update collection database (repo_results.csv)             │
│ • Add metadata tracking files                               │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Step 6: Documentation Updates                               │
│ • Update main README with new script counts                 │
│ • Regenerate REPO_INDEX.md with latest statistics           │
│ • Update HIGHLY_RATED_REPOS.md listing                      │
│ • Refresh category documentation                            │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Step 7: Create Pull Request                                 │
│ • Commit all changes to new branch                          │
│ • Generate comprehensive PR description                     │
│ • Include statistics and validation results                 │
│ • Tag with automation labels                                │
└─────────────────────────────────────────────────────────────┘

📋 Automation Components

Core Scripts

1. automate_discovery.py

Purpose: Main orchestration script that runs the complete automation process

Key Functions:

  • Coordinates all automation steps
  • Manages workflow execution
  • Handles error recovery
  • Generates comprehensive reports

Usage:

python3 automate_discovery.py \
  --min-stars 50 \
  --max-results 100 \
  --update-existing \
  --github-token $GITHUB_TOKEN

2. identify_batch_repos.py

Purpose: Discovers batch script repositories on GitHub

Features:

  • Multiple search query strategies
  • Star count filtering
  • Duplicate removal
  • Rate limit handling
  • CSV output generation

Usage:

python3 identify_batch_repos.py \
  --token $GITHUB_TOKEN \
  --min-stars 50 \
  --max-results 100

3. process_new_discoveries.py

Purpose: Filters new repositories against existing collection

Features:

  • Duplicate detection (by name, URL, directory)
  • Quality keyword analysis
  • Repository validation
  • Comprehensive filtering reports

Usage:

python3 process_new_discoveries.py \
  --new-repos discovered.csv \
  --existing-repos repo_results.csv \
  --output filtered.csv

4. quality_filter.py

Purpose: Advanced quality assessment and filtering

Features:

  • Multi-factor quality scoring
  • Suspicious pattern detection
  • Malware indicator checks
  • Repository authenticity validation
  • Activity and maintenance checks

Scoring Factors:

  • Star count (popularity)
  • Last update date (maintenance)
  • Description quality
  • Documentation presence
  • Suspicious pattern absence
  • Repository age
  • Fork ratio

5. integrate_repositories.py

Purpose: Integrates new repositories into the collection

Features:

  • Directory structure creation
  • README generation with safety guidelines
  • Metadata file creation
  • Category organization
  • Collection database updates

Usage:

python3 integrate_repositories.py \
  --new-repos filtered.csv \
  --base-path ../../.. \
  --update-collection

6. update_upstream_repos.py ⭐ NEW

Purpose: Updates existing repositories from their upstream sources

Features:

  • Fetches latest versions from GitHub
  • Metadata tracking with timestamps
  • Smart update detection (checks last commit)
  • Configurable update limits
  • Preservation of metadata files

Usage:

python3 update_upstream_repos.py \
  --base-path ../../.. \
  --github-token $GITHUB_TOKEN \
  --limit 10

Update Logic:

  • Checks .upstream_metadata.json for last update time
  • Updates if older than 30 days
  • Checks GitHub API for new commits
  • Only updates if new content is available
  • Preserves local metadata and organization

7. update_collection.py

Purpose: Updates collection statistics and documentation

Features:

  • Script count updates
  • Category statistics
  • Documentation regeneration
  • Index file updates

8. generate_highly_rated_docs.py

Purpose: Generates documentation for highly-rated repositories

Features:

  • Sorted by star count
  • Category grouping
  • Statistics and metrics
  • Markdown formatting

GitHub Actions Workflow

File: .github/workflows/discover-repositories.yml

Triggers:

  1. Scheduled: Every 2 weeks (1st and 3rd Monday of each month at 9:00 AM UTC)
  2. Manual: Via "Run workflow" button with custom parameters
  3. Push: When automation scripts are updated
  4. Pull Request: To test changes before merging
  5. API Dispatch: Via external triggers

Workflow Steps:

  1. Checkout repository
  2. Set up Python environment
  3. Install dependencies (requests library)
  4. Configure Git for commits
  5. Create feature branch
  6. Run upstream updates (limited to 5 repos per run)
  7. Discover new repositories
  8. Pre-filter for duplicates
  9. Apply quality filtering
  10. Integrate new repositories
  11. Generate updated documentation
  12. Validate all changes
  13. Commit changes
  14. Create pull request

Environment Variables:

  • GITHUB_TOKEN: Provided automatically by GitHub Actions
  • MIN_STARS: Configurable via workflow inputs (default: 50)
  • MAX_RESULTS: Configurable via workflow inputs (default: 100)

🎮 Manual Control

Running Automation Manually

Complete Automation

cd z.repo_support/scripts
python3 automate_discovery.py \
  --min-stars 50 \
  --max-results 100 \
  --github-token $GITHUB_TOKEN

Discovery Only

cd z.repo_support/scripts
python3 identify_batch_repos.py \
  --token $GITHUB_TOKEN \
  --min-stars 50 \
  --output discovered.csv

Upstream Updates Only

cd z.repo_support/scripts
python3 update_upstream_repos.py \
  --base-path ../../.. \
  --github-token $GITHUB_TOKEN \
  --limit 10

Using Maintenance Script

# From repository root
./maintenance find-repos --min-stars 100
./maintenance update-upstream --limit 5
./maintenance update-count

Testing Changes

Dry Run Mode

python3 automate_discovery.py --dry-run --min-stars 100

Limited Updates

python3 update_upstream_repos.py --limit 5 --base-path ../..

Specific Repositories

python3 update_upstream_repos.py \
  --repos massgravel/Microsoft-Activation-Scripts AveYo/MediaCreationTool.bat

🔐 Security & Safety

Quality Checks

  • Suspicious Pattern Detection: Flags repositories with malware indicators
  • Malicious Content Screening: Checks for crack/hack/virus keywords
  • Repository Validation: Verifies authenticity and legitimacy
  • Activity Monitoring: Ensures repositories are actively maintained
  • Documentation Verification: Confirms proper documentation exists

Safety Guidelines in Generated READMEs

Each integrated repository includes:

  • ⚠️ Safety warnings and guidelines
  • 🔒 Recommendations to review scripts before execution
  • 🧪 Suggestions to test in safe environments (VMs)
  • 💾 Backup reminders before running system modification scripts
  • 🛡️ Antivirus scan recommendations

📊 Quality Scoring System

Repositories are scored on multiple factors:

Star Score (30%)

  • 1000+ stars: Full points
  • 100-999 stars: Scaled score
  • 50-99 stars: Minimum score

Freshness Score (20%)

  • Updated within 6 months: Full points
  • 6-12 months: Reduced score
  • Older: Lower score

Description Quality (15%)

  • Excellent keywords: Bonus points
  • Good keywords: Standard points
  • Warning keywords: Penalty
  • Missing description: Reduced score

Documentation Score (15%)

  • Comprehensive README: Full points
  • Basic README: Reduced score
  • No README: Minimum score

Security Score (20%)

  • No suspicious patterns: Full points
  • Warning patterns: Reduced score
  • Bad patterns: Rejected

🎯 Configuration

Minimum Star Counts

  • Default: 50 stars (ensures quality)
  • High Quality: 100+ stars (popular repositories)
  • Top Tier: 1000+ stars (widely trusted)

Update Frequency

  • Scheduled Runs: Bi-weekly (every 2 weeks)
  • Upstream Updates: Limited to 5 repos per automated run
  • Manual Triggers: Available anytime via GitHub Actions

Processing Limits

  • Discovery: 100 repositories per search (configurable)
  • Upstream Updates: 5 repositories per automated run (prevents timeouts)
  • Manual Runs: Unlimited (configurable via --limit)

📝 Maintenance

Regular Tasks (Automated)

  • ✅ Repository discovery (bi-weekly)
  • ✅ Upstream updates (bi-weekly, limited)
  • ✅ Documentation updates (automatic)
  • ✅ Script count updates (automatic)
  • ✅ Quality filtering (automatic)

Occasional Tasks (Manual)

  • Review and merge automation PRs
  • Adjust quality thresholds if needed
  • Update category structures
  • Refine search queries
  • Clean up outdated repositories

Monitoring

  • Check GitHub Actions workflow status
  • Review automation PR summaries
  • Monitor for failures or errors
  • Validate quality of integrated repositories

🚀 Future Enhancements

Potential improvements to the automation system:

  1. Enhanced Update Intelligence

    • Track specific file changes
    • Selective file updates
    • Conflict resolution
  2. Advanced Quality Metrics

    • Community feedback integration
    • Usage statistics
    • Download counts
    • Issue resolution rate
  3. Improved Categorization

    • Machine learning-based categorization
    • Multi-category support
    • Dynamic category creation
  4. Extended Monitoring

    • Repository health checks
    • Broken link detection
    • License compliance verification
    • Dependency vulnerability scanning

📚 Related Documentation

❓ Troubleshooting

Common Issues

Automation Workflow Not Running

  • Check if GitHub Actions are enabled
  • Verify workflow file syntax
  • Check repository permissions

Discovery Finding No Repositories

  • Verify GitHub token is valid
  • Check API rate limits
  • Adjust search parameters

Upstream Updates Failing

  • Verify network connectivity
  • Check repository accessibility
  • Ensure valid GitHub URLs

Integration Errors

  • Check directory permissions
  • Verify CSV file formats
  • Ensure category directories exist

Getting Help

  • Review workflow logs in GitHub Actions
  • Check automation result files
  • Examine error messages in PR descriptions
  • Consult individual script help: python3 script.py --help

Note: This automation system is designed to be completely hands-off. Once configured, it requires no manual intervention and will automatically maintain and grow the collection over time.