Command-line tool for downloading biological data from NCBI/ENA with progress tracking and validation.
- π Data Summary: Show detailed summary of sequencing data before download
- π₯ Progress Tracking: Real-time progress bars during downloads
- β MD5 Verification: Automatic checksum verification after download
- π Gzip Validation: Verify gzip file integrity for compressed files
- π Incomplete Tracking: Automatically track failed/incomplete downloads
- β Skip Completed: Never re-download already completed files
- π Retry Support: Easy retry mechanism for incomplete downloads
First, make sure the seq-fetch library is installed:
cd ../seq-fetch
pip install -e .cd seq-fetch-cli
pip install -e .This will install the seq-fetch command available globally.
# Download a single run
seq-fetch download SRR10617884
# Download to specific directory
seq-fetch download SRR10617884 -o ./data
# Download multiple runs
seq-fetch download SRR10617884 SRR10617885 SRR10617886# Download SRA format instead of FastQ
seq-fetch download SRR10617884 --type sra# Show run summary
seq-fetch summary SRR10617884
# Show sample summary
seq-fetch summary SAMN14684814 --type sampleDownload files for one or more accessions.
seq-fetch download [OPTIONS] ACCESSIONS...
Options:
-o, --output-dir PATH Output directory (default: current directory)
-t, --type [fastq|sra] File type to download (default: fastq)
--no-summary Skip data summary before download
--no-progress Skip progress bars
--no-md5 Skip MD5 verification
--no-gzip Skip gzip validation
--max-retries INTEGER Max retry attempts (default: 3)
-r, --record-file PATH Custom incomplete records fileExamples:
# Basic download with all validations
seq-fetch download SRR10617884
# Silent download (no summary, no progress)
seq-fetch download SRR10617884 --no-summary --no-progress
# Download without verification (faster but less safe)
seq-fetch download SRR10617884 --no-md5 --no-gzip
# Download multiple accessions
seq-fetch download SRR10617884 SRR10617885 -o ./dataShow detailed summary of sequencing data without downloading.
seq-fetch summary ACCESSION [OPTIONS]
Options:
-t, --type [run|sample|study] Accession type (default: run)Example Output:
============================================================
Run Summary: SRR10617884
============================================================
Title: Illumina NovaSeq 6000 sequencing
Platform: ILLUMINA
Instrument: NovaSeq 6000
Library Strategy: RNA-Seq
Sample: SAMN14684814
Study: SRP123456
π¦ FastQ Files (2):
- SRR10617884_1.fastq.gz (2.50 GB) [1]
- SRR10617884_2.fastq.gz (2.48 GB) [2]
Total Size: 4.98 GB
============================================================
List all incomplete or failed downloads.
seq-fetch incomplete [OPTIONS]
Options:
-r, --record-file PATH Custom incomplete records file
--by-reason [md5_mismatch|gzip_invalid|download_failed] Filter by reasonExample Output:
Incomplete Downloads (2 records):
================================================================================
β SRR10617884
File: SRR10617884_1.fastq.gz
Type: fastq
Reason: md5_mismatch
Size: 2.50 GB
Retries: 3
Time: 2026-02-27T10:30:00
β SRR10617885
File: SRR10617885.fastq.gz
Type: fastq
Reason: gzip_invalid
Size: 1.80 GB
Retries: 2
Time: 2026-02-27T11:00:00
================================================================================
Tip: Use 'seq-fetch retry' to retry downloading these files.
Retry downloading incomplete files.
seq-fetch retry [OPTIONS] [ACCESSIONS]...
Options:
-o, --output-dir PATH Output directory
-r, --record-file PATH Custom incomplete records file
--all Retry all incomplete downloads
--type [fastq|sra] File type (default: fastq)Examples:
# Retry all incomplete downloads
seq-fetch retry --all
# Retry specific accession
seq-fetch retry SRR10617884
# Retry with custom output directory
seq-fetch retry --all -o ./dataManually verify a downloaded file.
seq-fetch verify FILE [OPTIONS]
Options:
--md5 TEXT Expected MD5 checksum
--gzip Also validate gzip formatExamples:
# Verify MD5 only
seq-fetch verify sample.fastq.gz --md5 abc123def456
# Verify gzip integrity
seq-fetch verify sample.fastq.gz --gzip
# Verify both
seq-fetch verify sample.fastq.gz --md5 abc123 --gzip# 1. First, check the data summary
seq-fetch summary SRR10617884
# 2. Download with all validations
seq-fetch download SRR10617884 -o ./data
# 3. If interrupted or failed, check incomplete files
seq-fetch incomplete
# 4. Retry incomplete downloads
seq-fetch retry --all -o ./data# Download multiple runs
seq-fetch download SRR10617884 SRR10617885 SRR10617886 -o ./data
# Check if any failed
seq-fetch incomplete
# Retry only the failed ones
seq-fetch retry --all -o ./dataThe tool automatically tracks completed files and skips them:
# First run - downloads all files
seq-fetch download SRR10617884 SRR10617885 -o ./data
# Second run - skips already completed files
seq-fetch download SRR10617884 SRR10617885 -o ./data
# Output: "File exists, verifying: ..."Incomplete downloads are automatically tracked in ~/.seq-fetch/incomplete.json.
The record includes:
- Accession number
- File path
- File type (fastq/sra)
- Failure reason (md5_mismatch, gzip_invalid, download_failed)
- Expected MD5
- File size
- Retry count
- Timestamp
This allows you to:
- Know exactly which files need to be re-downloaded
- Understand why they failed
- Retry only the incomplete files without affecting completed ones
All options can be combined:
seq-fetch download SRR10617884 \
-o ./data \
--no-summary \
--no-progress \
--max-retries 5 \
--record-file ./custom_records.jsonIf MD5 verification fails repeatedly:
- Check your network connection
- The source file on ENA might be corrupted
- Try downloading with
--no-md5(not recommended)
If gzip validation fails:
- The download might be incomplete (interrupted)
- Try retrying:
seq-fetch retry SRRXXXXXXX - Check disk space
If completed files are being re-downloaded:
- Make sure you're using the same output directory
- Check the incomplete records:
seq-fetch incomplete - The file might have failed validation (check records for reason)
MIT License
- Built on top of the
seq-fetchlibrary - Data provided by ENA (European Nucleotide Archive)