--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left layout: post title: Annotation - Olurida_v081 MAKER Proteins BLASTp on Mox date: '2019-01-08 09:25' tags: - Ostrea lurida - Olympia oyster - MAKER - blastp - Olurida_v081 categories: - 2019 - Olympia Oyster Genome Sequencing --- [After getting through the initial MAKER annotation and SNAP gene predictions](https://robertslab.github.io/sams-notebook/posts/2018/2018-11-27-Annotation-Olurida_v081-MAKER-on-Mox/), and then [renaming the sequences](https://robertslab.github.io/sams-notebook/2019/01/08/Annotation-Olurida_v081-MAKER-ID-Mapping/), I needed to run BLASTp on the FastA file produced by MAKER id mapping in order to assign functionality to the predicted genes. Input FastA (9.3MB): - [20190108_oly_maker_id_mapping/20181127_oly_genome_snap02.all.maker.proteins.renamed.fasta](https://gannet.fish.washington.edu/Atumefaciens/20190108_oly_maker_id_mapping/20181127_oly_genome_snap02.all.maker.proteins.renamed.fasta) SBATCH script (text file): - [20190108_oly_maker_blastp/20190108_oly_maker_blastp.sh](https://gannet.fish.washington.edu/Atumefaciens/20190108_oly_maker_blastp/20190108_oly_maker_blastp.sh)

#!/bin/bash
## Job Name
#SBATCH --job-name=blastp
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=15-00:00:00
## Memory per node
#SBATCH --mem=120G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190108_oly_maker_blastp

# Load Python Mox module for Python module availability

module load intel-python3_2017

# Load Open MPI module for parallel, multi-node processing

module load icc_19-ompi_3.1.2

# Document programs in PATH (primarily for program version ID)

date >> system_path.log
echo "" >> system_path.log
echo "System PATH for $SLURM_JOB_ID" >> system_path.log
echo "" >> system_path.log
printf "%0.s-" {1..10} >> system_path.log
echo ${PATH} | tr : \\n >> system_path.log


# Add BLAST to system PATH
export PATH=$PATH:/gscratch/srlab/programs/ncbi-blast-2.6.0+/bin
export BLASTDB=/gscratch/srlab/blastdbs/UniProtKB_20181008

# Define variables

blastp=/gscratch/srlab/programs/ncbi-blast-2.6.0+/bin/blastp
uniprot_db=/gscratch/srlab/blastdbs/UniProtKB_20181008/20181008_uniprot_sprot.fasta
maker_p_fasta=/gscratch/scrubbed/samwhite/outputs/20190108_oly_maker_id_mapping/20181127_oly_genome_snap02.all.maker.proteins.renamed.fasta
output=20190108_blastp.outfmt6

# Run blastp
${blastp} \
-query ${maker_p_fasta} \
-db ${uniprot_db} \
-out ${output} \
-max_target_seqs 1 \
-evalue 1e-6 \
-outfmt 6 \
-num_threads 28
--- # RESULTS Output folder: - [20190108_oly_maker_blastp/](https://gannet.fish.washington.edu/Atumefaciens/20190108_oly_maker_blastp/) BLASTp output: - [20190108_oly_maker_blastp/20190108_blastp.outfmt6](https://gannet.fish.washington.edu/Atumefaciens/20190108_oly_maker_blastp/20190108_blastp.outfmt6) Great, now that I have this, I can use it to incorporate into the functional annotations of MAKER outputs (GFFs and FastA).