--- title: "06-summer2021_blast" output: html_document date: "2024-02-26" --- Rmd to perform BLAST with the _Pycnopodia helianthoides_ genome gene list against the published _Pycnopodia helianthoides_ genome. Genome gene list: `project-pycno-sizeclass-2022/data/augustus.hints.codingseq` Based on this jupyter notebook by Steven Roberts: https://github.com/RobertsLab/code/blob/master/09-blast.ipynb ```{bash} pwd ``` ```{bash} /home/shared/ncbi-blast-2.15.0+/bin/blastx -h ``` # Create a BLAST database I would like to make a database of UniProt/Swiss-prot. see https://www.uniprot.org/downloads ```{bash} cd /home/shared/8TB_HDD_02/graceac9/GitHub/paper-pycno-sswd-2021/analyses/06-BLAST curl -O https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz mv uniprot_sprot.fasta.gz uniprot_sprot_r2021_03.fasta.gz gunzip -k uniprot_sprot_r2021_03.fasta.gz cd - ``` ```{bash} pwd ``` ```{python} bldr = "/home/shared/ncbi-blast-2.15.0+/bin/" ``` ```{bash} ${bldr}makeblastdb \ -in /home/shared/8TB_HDD_02/graceac9/GitHub/paper-pycno-sswd-2021/analyses/06-BLAST/uniprot_sprot_r2021_03.fasta \ -dbtype prot \ -out /home/shared/8TB_HDD_02/graceac9/GitHub/paper-pycno-sswd-2021/analyses/06-BLAST/uniprot_sprot_r2021_03 ``` # Get a Query Sequence ```{bash} pwd ``` `rsync` the data/augustus.hints.codingseq to the raven code directory. In command line, `ssh` into Raven, and into this working directory. Then use code below to `rsync` data to this directory. Code: `rsync --archive --progress --verbose graceac9@raven.fish.washington.edu:/home/shared/8TB_HDD_02/graceac9/GitHub/project-pycno-sizeclass-2022/data/augustus.hints.codingseq /home/shared/8TB_HDD_02/graceac9/GitHub/paper-pycno-sswd-2021/code/` ```{python} #how many sequences? lets count ">" as we know each contig has 1 !grep -c ">" augustus.hints.codingseq ``` 26581 sequences # Run BLAST ```{bash} pwd ``` Set paths to programs: ```{python} blast_dir="/home/shared/ncbi-blast-2.15.0+/bin" blastx="${blast_dir}/blastx" ``` Set paths to files: ```{python} genome_fasta="/home/shared/8TB_HDD_02/graceac9/GitHub/paper-pycno-sswd-2021/code/augustus.hints.codingseq" sp_db="" ``` ```{bash} pwd ``` code from: https://sr320.github.io/tumbling-oysters/posts/sr320-04-mytgo/index.html ```{bash} /home/shared/ncbi-blast-2.15.0+/bin/blastx \ -query augustus.hints.codingseq \ -db ../analyses/06-BLAST/uniprot_sprot_r2021_03 \ -out ../analyses/06-BLAST/summer2021-uniprot_blastx.tab \ -evalue 1E-20 \ -num_threads 40 \ -max_target_seqs 1 \ -outfmt 6 ``` ```{bash} head -2 ../analyses/06-BLAST/summer2021-uniprot_blastx.tab ``` Note: In excel, I separated the columns that were "|" deliminated and made them "tab" delimited. # Annotate the blast output with uniprot-SP-GO ```{bash} pwd ``` Move the `/analyses/06-BLAST/summer2021-uniprot_blastx.tab` to the code directory the uniprot-SP-GO.sorted and GO-GOslim.sorted exist in my crab repositories, so I'll copy them over into this code directory Check that everything is in the working directory: ```{bash} ls ``` Yep!