Skip to content

mriffle/nf-filter-fasta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nf-filter-fasta

A Nextflow workflow to filtering a large FASTA file based on the quality PSMs returned by a proteomics search (currently comet + percolator). The quality scores for determining which proteins to include are user-tunable parameters.

The workflow currently runs:

  • msconvert (if raw files are used as input)
  • comet
  • filterPin (removes non-rank one inputs to percolator)
  • percolator
  • filterFasta (keeps only proteins with at least one quality peptide from comet/percolator)
  • fastaFixer (removes duplicate entries and entries with invalid residues)
  • decoyFastaGenerator.pl from the TPP

Parameters

This workflow accepts the following parameters:

  • comet_params - required Path to the comet params file to use for the search. See https://raw.githubusercontent.com/mriffle/nf-filter-fasta/main/example_files/comet.params for example comet.params.
  • fasta - required Path to the original, unfiltered FASTA file
  • spectra_dir - required Path to a directory containing either raw or mzML files. If mzML files are found, raw files will be ignored.
  • email - To whom a completion email should be sent. Exclude this parameter to send no email. Default is to send no email.
  • psm_qvalue_filter - PSMs with a q-value greater than this will be excluded when finding quality peptides. Default: 0.01
  • peptide_qvalue_filter - Peptides with a q-value greater than this will be excluded when finding quality peptides. Default: 0.01
  • distinct_peptide_count - Proteins with fewer than this many peptides will be excluded from final FASTA. Default: 3
  • decoy_prefix - Generated decoys will have this as a prefix in their name. Default: DEBRUIJN
  • final_fasta_base_name - Use this name as the base name of the generated FASTA. If left out, will use base name of input FASTA.
  • mzml_cache_directory - The cache directory to use when converting raw files to mzML. Default: /data/mass_spec/nextflow/nf-filter-fasta/mzml_cache
  • panorama_cache_directory - The cache directory to use when downloading raw files from PanoramaWeb. Default: /data/mass_spec/nextflow/panorama/raw_cache

How To Use

Use the following command(s) to run the workflow:

  • To ensure latest version of workflow is installed:

    nextflow pull -r main mriffle/nf-filter-fasta

  • To run the workflow specifying parameters on command line:

    nextflow run -r main mriffle/nf-filter-fasta --comet_params /path/to/comet.params --spectra_dir /path/to/mzml_files --fasta /path/to/file.fasta

  • To run workflow using a configuration file:

    Create configuration file called pipeline.config in this example (can be called anything). You can put any of the parameters above in it as:

    params {
      comet_params = '/path/to/comet.params'
      fasta = '/path/to/file.fasta'
      spectra_dir  = '/path/to/mzml_files'
      psm_qvalue_filter = 0.05
    }
    

    Then run the workflow using:

    nextflow run -r main mriffle/nf-filter-fasta -c pipeline.config

Output

The output of the pipeline will be placed in the results/nf-filter-fasta directory (relative to where the workflow was run). Assuming your FASTA file was named myname.fasta the output files include:

  • fasta/myname.filtered.fasta - The FASTA file that has been filtered using comet/percolator results.
  • fasta/myname.filtered.fixed.fasta - The above file after it has been "fixed" (any duplicate entries removed and sequences containing invalid residues removed).
  • fasta/myname.filtered.fixed.plusdecoys.fasta - The above file that has had decoys added.
  • comet/*.pin - The percolator input files generated by the comet search.
  • comet/*.pep.xml - The comet results files.
  • percolator/combined_filtered.pout.xml - The percolator results.

PanoramaWeb Integration

  1. You must first set up your PanoramaWeb credentials. After finding your API KEY in PanoramaWeb save it to Nextflow by typing:

    nextflow secrets set PANORAMA_API_KEY "api key from PanoramaWeb"

  2. All file locations that begin with https:// are assumed to be PanoramaWeb WebDAV URLs. To specify PanoramaWeb locations for all input files, the following pipeline.config file could be used:

    params {
        comet_params = 'https://panoramaweb.org/_webdav/FOLDER_PATH/@files/comet.params'
        fasta = 'https://panoramaweb.org/_webdav/FOLDER_PATH/@files/myname.fasta'
        spectra_dir  = 'https://panoramaweb.org/_webdav/FOLDER_PATH/@files/FOLDER_NAME/'
        psm_qvalue_filter = 0.05
    }
    

    Note: it is not required that all files be in PanoramaWeb, mixing local and PanoramaWeb files will work.

About

Nextflow workflow for creating a small FASTA from a large FASTA based on hits found by comet

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors