--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left title: Transcriptome Annotation - M.magister De Novo Transcriptome Assembly for DuMOAR Project Using DIAMOND BLASTx on Mox date: '2023-11-09' draft: false engine: knitr categories: - mox - DIAMOND - BLASTx - transcriptome - 2023 - DuMOAR - Dungeness crab - Metacarcinus magister --- # Intro As part of the process for using [`Trinotate`](https://github.com/Trinotate/Trinotate/wiki) for [transcriptome annotation](https://github.com/laurahspencer/DuMOAR/issues/44) (GitHub Issue), I previously ran [TransDecoder on 20231107](../2023-11-07-Transcriptome-Annotation---M.magister-De-Novo-Transcriptome-Assembly-for-DuMOAR-Project-Using-TransDecoder-on-Mox/index.qmd). Next, I needed to run BLASTx. As usual, I opted for [`DIAMOND`](https://github.com/bbuchfink/diamond) BLASTx, as it is insanely faster than [NCBI `BLASTx`](https://www.ncbi.nlm.nih.gov/books/NBK279690/). This was run on Mox. ## SLURM script - [20231109-mmag-transcriptome-diamond-blastx.sh](https://github.com/RobertsLab/sams-notebook/blob/master/sbatch_scripts/20231109-mmag-transcriptome-diamond-blastx.sh) (GitHub) ```bash #!/bin/bash ## Job Name #SBATCH --job-name=20231109-mmag-transcriptome-diamond-blastx ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=3-00:00:00 ## Memory per node #SBATCH --mem=500G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20231109-mmag-transcriptome-diamond-blastx ### DIAMOND BLASTx of M.magister de novo transcriptome. ### For use in DuMOAR transcript annotation. ################################################################################### # These variables need to be set by user # Exit script if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # SegFault fix? export THREADS_DAEMON_MODEL=1 # Programs array declare -A programs_array programs_array=( [diamond]="/gscratch/srlab/programs/diamond-v2.1.1/diamond" ) # DIAMOND UniProt database dmnd_db="/gscratch/srlab/blastdbs/20230727-uniprot-swissprot-reviewed/uniprot_sprot.dmnd" # Genome (FastA) fasta="/gscratch/srlab/sam/data/M_magister/transcriptomes/trinity_denovo.Trinity.fasta" ################################################################################### # Strip leading pat and extensions no_path=$(echo "${fasta##*/}") # Strip extension no_ext=$(echo "${no_path%.*}") # Run DIAMOND with blastx ${programs_array[diamond]} blastx \ --db ${dmnd_db} \ --query "${fasta}" \ --out "${no_ext}".blastx.outfmt6 \ --outfmt 6 \ --sensitive \ --evalue 1e-25 \ --max-target-seqs 1 \ --block-size 15.0 \ --index-chunks 4 # Generate checksums for future reference for file in * do echo "" echo "Generating checksum for ${file}." echo "" md5sum "${file}" | tee --append checksums.md5 echo "" done echo "Generating checksum for ${fasta}." echo "" md5sum "${fasta}" | tee --append checksums.md5 ################################################################################### # Disable exit on error set +e # Capture program options if [[ "${#programs_array[@]}" -gt 0 ]]; then echo "Logging program options..." for program in "${!programs_array[@]}" do { echo "Program options for ${program}: " echo "" # Handle samtools help menus if [[ "${program}" == "samtools_index" ]] \ || [[ "${program}" == "samtools_sort" ]] \ || [[ "${program}" == "samtools_view" ]] then ${programs_array[$program]} # Handle DIAMOND BLAST menu elif [[ "${program}" == "diamond" ]]; then ${programs_array[$program]} help # Handle NCBI BLASTx menu elif [[ "${program}" == "blastx" ]]; then ${programs_array[$program]} -help fi ${programs_array[$program]} -h echo "" echo "" echo "----------------------------------------------" echo "" echo "" } &>> program_options.log || true # If MultiQC is in programs_array, copy the config file to this directory. if [[ "${program}" == "multiqc" ]]; then cp --preserve ~/.multiqc_config.yaml multiqc_config.yaml fi done fi # Document programs in PATH (primarily for program version ID) { date echo "" echo "System PATH for $SLURM_JOB_ID" echo "" printf "%0.s-" {1..10} echo "${PATH}" | tr : \\n } >> system_path.log ``` /---------- # RESULTS ## Runtime As usual, [`DIAMOND`](https://github.com/bbuchfink/diamond) BLASTx was ridiculously fast; 1m and 17secs! ![Screencap showing Mox DIAMOND BLASTx run time of 1min and 17secs](20231109-mmag-transcriptome-diamond-blastx-runtime.png) ## FILES Output directory: - [20231109-mmag-transcriptome-diamond-blastx/](https://gannet.fish.washington.edu/Atumefaciens/20231109-mmag-transcriptome-diamond-blastx/) ### BLASTx (txt; output format 6) - [trinity_denovo.Trinity.blastx.outfmt6](https://gannet.fish.washington.edu/Atumefaciens/20231109-mmag-transcriptome-diamond-blastx/trinity_denovo.Trinity.blastx.outfmt6)