--- title: "10-comprehensive-annotation" output: html_document --- Run on raven ```{r} Sys.info() ``` The C gigas protein sequence we are annotating ```{bash} head ../data/GCF_902806645.1_cgigas_uk_roslin_v1_translated_cds.faa ``` Creating tab delim file where seq name is separated from sequence ```{bash} perl -e '$count=0; $len=0; while(<>) {s/\r?\n//; s/\t/ /g; if (s/^>//) { if ($. != 1) {print "\n"} s/ |$/\t/; $count++; $_ .= "\t";} else {s/ //g; $len += length($_)} print $_;} print "\n"; warn "\nConverted $count FASTA records in $. lines to tabular format\nTotal sequence length: $len\n\n";' data/GCF_902806645.1_cgigas_uk_roslin_v1_translated_cds.faa > output/GCF_902806645.1_cgigas_uk_roslin_v1_translated_cds.tab ``` Connecting gene ID (LOC#######) wiht NCBI Accession (query for blast) ```{bash} cat ../output/GCF_902806645.1_cgigas_uk_roslin_v1_translated_cds.tab | awk '{print $2, $1}' | sed 's/\[gene=//g' | sed 's/\]//g' > ../output/LOC_Acc.tab ``` Drosophila Blast ```{bash} /home/shared/ncbi-blast-2.11.0+/bin/tblastn \ -query ../data/GCF_902806645.1_cgigas_uk_roslin_v1_translated_cds.faa \ -db ../blastdb/dmel-all-CDS-r6.37 \ -out ../output/CgR-blastp-dmel.tab \ -num_threads 20 \ -max_target_seqs 1 \ -max_hsps 1 \ -outfmt "6 qaccver saccver evalue" ``` ```{bash} /home/shared/ncbi-blast-2.11.0+/bin/blastp \ -query ../data/GCF_902806645.1_cgigas_uk_roslin_v1_translated_cds.faa \ -db /home/shared/blast_dbs/20210613_ncbi_sp_v5/swissprot \ -out ../output/CgR-blastp-sp.tab \ -num_threads 20 \ -max_target_seqs 1 \ -max_hsps 1 \ -outfmt "6 qaccver saccver evalue" ``` ```{bash} cd ../blastdb curl -O https://gannet.fish.washington.edu/seashell/bu-mox/blastdb/Caenorhabditis_elegans.WBcel235.pep.all.fa ``` ```{bash} /home/shared/ncbi-blast-2.11.0+/bin/makeblastdb \ -in ../../blastdb/dmel-all-CDS-r6.37.fasta \ -dbtype nucl \ -out ../../blastdb/dmel-all-CDS-r6.37 ``` ```{bash} /home/shared/ncbi-blast-2.11.0+/bin/blastp \ -query ../data/GCF_902806645.1_cgigas_uk_roslin_v1_translated_cds.faa \ -db ../blastdb/Caenorhabditis_elegans.WBcel235.pep \ -out ../output/CgR-blastp-cel.tab \ -num_threads 20 \ -max_target_seqs 1 \ -max_hsps 1 \ -outfmt "6 qaccver saccver evalue" ```