--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left layout: post title: BS-Seq Analysis - Nextflow EpiDiverse SNP Pipeline for Haws Hawaii C.gigas BAMs from Yaamini Base Config date: '2022-12-15 19:13' tags: - BSseq - SNP - epidiverse - haws - Crassostrea gigas - Pacific oyster - Hawaii categories: - 2022 - Miscellaneous --- [Yaamini asked me to run the `epidiverse/snp` pipeline](https://github.com/RobertsLab/resources/issues/1558) (GitHub Issue) on her Haws [_Crassostrea gigas_ (Pacific oyster)](http://en.wikipedia.org/wiki/Pacific_oyster) Hawaii bisuflite sequencing BAMs for SNP identification. I ran a version of this [yesterday (20221214), using a modified config file](https://robertslab.github.io/sams-notebook/posts/2022/2022-12-14-BS-Seq-Analysis---Nextflow-EpiDiverse-SNP-Pipeline-for-Haws-Hawaii-C.gigas-BAMs-from-Yaamini/) to see if there would be a noticeable difference in runtimes. For _this_ run, I just utilized the default, base config file with no modifications. This was run using BAMs found here: - [`https://gannet.fish.washington.edu/spartina/project-oyster-oa/Haws/bismark-2/r3644*.deduplicated.sorted.bam`](https://gannet.fish.washington.edu/spartina/project-oyster-oa/Haws/bismark-2/) Genome FastA was a version of the `cgigas_uk_roslin_v1` genome in which Yaamini appended the mitochondrial sequences: - [cgigas_uk_roslin_v1_genomic-mito.fa](https://gannet.fish.washington.edu/spartina/project-oyster-oa/Haws/data/cgigas_uk_roslin_v1_genomic-mito.fa) (FastA; 626MB) As part of this, I decided to mess around with the [`EpiDivers/snp`](https://github.com/EpiDiverse/snp) base config file to try to speed things up a bit. As mentioned, the job was run on Mox. SBATCH script (GitHub): - [20221215-cgig-nextflow-epdiverse-snp-haws-hawaii-base_config.sh](https://github.com/RobertsLab/sams-notebook/blob/master/sbatch_scripts/20221215-cgig-nextflow-epdiverse-snp-haws-hawaii-base_config.sh) ```bash #!/bin/bash ## Job Name #SBATCH --job-name=20221215-cgig-nextflow-epdiverse-snp-haws-hawaii-base_config ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=12-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20221215-cgig-nextflow-epdiverse-snp-haws-hawaii-base_config # Run EpiDiverse/snp on C.gigas Bismark BAMs generated by Yaamini for Haws Hawaii project. # Requires a FastA file with extension: .fa # Requires a FastA index file to be in same directory as FastA. #### Duplicate of 20221214 run, but this run uses base config to compare run times. ################################################################################### # These variables need to be set by user ## Directory with BAM(s) bams_dir="/gscratch/scrubbed/samwhite/data/C_gigas/BSseq" ## Location of EpiDiverse/snp pipeline directory epi_snp="/gscratch/srlab/programs/epidiverse-pipelines/snp" ## FastA file is required to end with .fa ## Requires FastA index file to be present in same directory as FastA genome_fasta="/gscratch/srlab/sam/data/C_gigas/genomes/cgigas_uk_roslin_v1_genomic-mito.fa" ## Location of Nextflow nextflow="/gscratch/srlab/programs/nextflow-21.10.6-all" ## Specify desired/needed version of Nextflow nextflow_version="20.07.1" ################################################################################### # Exit script if a command fails set -e # Load Anaconda # Uknown why this is needed, but Anaconda will not run if this line is not included. . "/gscratch/srlab/programs/anaconda3/etc/profile.d/conda.sh" # Activate NF-core conda environment conda activate epidiverse-snp_env ## Run EpiDiverse/snp NXF_VER=${nextflow_version} \ ${nextflow} run \ ${epi_snp} \ --input ${bams_dir} \ --reference ${genome_fasta} \ --variants \ --clusters ################################################################################### # Capture program options if [[ "${#programs_array[@]}" -gt 0 ]]; then echo "Logging program options..." for program in "${!programs_array[@]}" do { echo "Program options for ${program}: " echo "" # Handle samtools help menus if [[ "${program}" == "samtools_index" ]] \ || [[ "${program}" == "samtools_sort" ]] \ || [[ "${program}" == "samtools_view" ]] then ${programs_array[$program]} # Handle DIAMOND BLAST menu elif [[ "${program}" == "diamond" ]]; then ${programs_array[$program]} help # Handle NCBI BLASTx menu elif [[ "${program}" == "blastx" ]]; then ${programs_array[$program]} -help fi ${programs_array[$program]} -h echo "" echo "" echo "----------------------------------------------" echo "" echo "" } &>> program_options.log || true # If MultiQC is in programs_array, copy the config file to this directory. if [[ "${program}" == "multiqc" ]]; then cp --preserve ~/.multiqc_config.yaml multiqc_config.yaml fi done echo "Finished logging programs options." echo "" fi # Document programs in PATH (primarily for program version ID) echo "Logging system \$PATH..." { date echo "" echo "System PATH for $SLURM_JOB_ID" echo "" printf "%0.s-" {1..10} echo "${PATH}" | tr : \\n } >> system_path.log echo "Finished logging system $PATH." ``` --- # RESULTS Runtime (~12hrs) was _remarkably_ faster than yerstday's runtime using the modified config file. In fact, using the defaul config file, the runtime was >50% _faster_! Good to know! ~![Screencap showing runtime of 11hrs 50mins for 20221215-cgig-nextflow-epdiverse-snp-haws-hawaii-base_config job on Mox](https://github.com/RobertsLab/sams-notebook/blob/master/images/screencaps/20221215-cgig-nextflow-epdiverse-snp-haws-hawaii-base_config.png?raw=true) Output folder: - [20221215-cgig-nextflow-epdiverse-snp-haws-hawaii-base_config/](https://gannet.fish.washington.edu/Atumefaciens/20221215-cgig-nextflow-epdiverse-snp-haws-hawaii-base_config/) - #### Variant Call Format (VCF) files and index files: - [20221215-cgig-nextflow-epdiverse-snp-haws-hawaii-base_config/snps/vcf/](https://gannet.fish.washington.edu/Atumefaciens/20221215-cgig-nextflow-epdiverse-snp-haws-hawaii-base_config/snps/vcf/)