Gene annotation

Author

Steven Roberts

Published

August 30, 2024

Summary

Here I will try to annotate the Manila clam genes. I know NCBI has annoation and I will also blast to SP to get GO information.

What does NCBI have?

https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_026571515.1/

Grabbing the code clicking on ‘datasets’ button in screenshot

cd ../data

/home/shared/datasets \
download genome accession GCF_026571515.1 --include gff3,rna,cds,protein,genome,seq-report
cd ../data
unzip ncbi_dataset.zip
head /home/shared/8TB_HDD_03/sr320/github/clamgonads-macsamples/data/ncbi_dataset/data/GCF_026571515.1/*
==> /home/shared/8TB_HDD_03/sr320/github/clamgonads-macsamples/data/ncbi_dataset/data/GCF_026571515.1/cds_from_genomic.fna <==
>lcl|NW_026851514.1_cds_XP_060579799.1_1 [gene=LOC132712676] [db_xref=GeneID:132712676] [protein=uncharacterized protein LOC132712676] [protein_id=XP_060579799.1] [location=complement(join(8570..8679,9698..9800,9934..10108,10375..10520,10828..10970,11352..11870,12239..12377,12599..12643))] [gbkey=CDS]
ATGGATAATCCATGTTCGTATGAATCAGTTAAAAGAGTTATCAAAGAAATACGAAAAACTAGCGGAGTGACAGAAGATAA
CTTAAAAACATGGTGTATTATGGGATGCGACGGCCTTCCCTATACGCTAGGATCGAGACTAATTGAAAAAAATAAAGATA
TGCAAAATATCTTACTTATTCCCGGACACGGGCACATAGAGATGAATGTGGTAAAAGCTGCTTTTAAGTTACTGTGGGAG
CCCATTCTACAGGACTTAAGTAAGGAATGTGGTTTCAAGTCACCAAGGGCACAAGTTGCGGCACAATCGTGTACTGATCA
CCACAAATCATACATGCTTTTGGAAATAATGTTTGAAAGTGCTTTGCAGGAGATCATGACAACTTTTATCATAAATAGTG
TGCAGAATTCAATAACACCTAACATAGCCACTTTTTTTGATTATATAAAATCTTCAAAAGACAAAAATTATAGATTTATG
TGTGACGCAATAATAAACTTTATTTTTCCAATATTTCTATATAGAGCCGGTGTTAGGAGAAACAATTTCGGATATATATC
TGCAGCTAAAGCTAAATTTTTTAAACTATTTTTTTCTGGCGGAATGAAAAATTATCAGCAACTTATTATGAAAGATATTA
AAACATACATCCTAGCACCTCCAGAAGTGAAGCATTTCTTACAGAAAACCCAATCTTTTACTGTCAGTGGACACCCTTCA

==> /home/shared/8TB_HDD_03/sr320/github/clamgonads-macsamples/data/ncbi_dataset/data/GCF_026571515.1/GCF_026571515.1_ASM2657151v2_genomic.fna <==
>NW_026851514.1 Ruditapes philippinarum isolate M1 unplaced genomic scaffold, ASM2657151v2 ctg11145_1_1_1_1, whole genome shotgun sequence
AAAAAaacgttcttttttttttttttcaaataaacattaatcacttcagcgcgcgcgcaattatcgttttcaaattaaag
tttcgtaagatatgttaatgacttcccgatgcgatcgactgtaagtaatttgtatatgtatgcggttttttggggttttt
tttttgtttgttttttaattaaagcttcttattctttaatattttttatgtatctcataatctgtattgaaaatttgaat
tttattacacatggactattcaaattatgatgcgttgccctagaattgaataaactttcttaattacacaatacggggtc
atactgatctttgcgtctttctctgaaataattatttccttgttaaaacatatttggaccacatatttctgtcaattgtt
cactttacttccgatttttgttgctacggaaacacatgtttctatttgtgttttctaatcaattcgaatctataatcagt
ttaatgggatcaccaccgggtaaaacgactactaggtactttggtcatgtttttaccccaaatatttgaatgtctctatt
gggattaaaatgagcatagttaactaaacaactggttgtataaatccctgacctgtctaattctaattatgagattagtg
aatatataagactttcactacaaaatattcctaaacaaaagagcacatgtttaaatttattcctatacaaataaaagatc

==> /home/shared/8TB_HDD_03/sr320/github/clamgonads-macsamples/data/ncbi_dataset/data/GCF_026571515.1/genomic.gff <==
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
#!genome-build ASM2657151v2
#!genome-build-accession NCBI_Assembly:GCF_026571515.1
#!annotation-date 10/30/2023
#!annotation-source NCBI RefSeq GCF_026571515.1-RS_2023_10
##sequence-region NW_026851514.1 1 38808
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=129788
NW_026851514.1  RefSeq  region  1   38808   .   +   .   ID=NW_026851514.1:1..38808;Dbxref=taxon:129788;Name=Unknown;chromosome=Unknown;collection-date=2014-10;country=USA: Puget Sound region%2C Pacific Northwest;dev-stage=adult clam;gbkey=Src;genome=genomic;isolate=M1;isolation-source=lagoon;mol_type=genomic DNA;sex=male;tissue-type=mantle

==> /home/shared/8TB_HDD_03/sr320/github/clamgonads-macsamples/data/ncbi_dataset/data/GCF_026571515.1/protein.faa <==
>XP_060551064.1 uncharacterized protein LOC132760673 [Ruditapes philippinarum]
MPKRKPSASTSGKGKATTSSTSDDHRLASIIANALVQNKSALKEVAALLPPMTIEPGIRPDGTEAELTEPDHAVQKQPRL
GNRGFGTGSSPKQGPCTNSTRNVTMDIAHVKKSLLHSSLAPGTHKAYDRFWERFLCFVSTSVVHFSPLPATPDCISDFVA
HLHILAFAPSSISSHLSAISHFHNISGFTDPCENFITRKMLVGCRKINLRSDTRKPLLNNHIQLLCQAVKEMFAHVPYLK
YLYMALILTAFNGFFRLGELLPATISSADKVVQITDLSSSSKSVRLKLLNHKTNKSDKPTLILMKSLQTNCPVKALNNYL
SMRGQSAGPLFLLANNSPLTLPSFREVFKLLLRLANLSPVHYKLHSFRIGACTQAILSGTPENEVMRMGRWKSNAFKRYI
RMPVVNATH
>XP_060551065.1 transmembrane cell adhesion receptor mua-3-like isoform X1 [Ruditapes philippinarum]
MGIPGIMRHAYIAILLMNLVLRGGSVTDICQTATADIVFIVDSSGSVGSSGYDDEIDFIKAIVNELVIDPNEVRMGLIDY
STSVHTSLGFNLNDPNFDTNAEVIAKLNSLPYSGGSTRTDLAIQAAKNMFSGPGNRPDVPDVLFTLTDGETNDGGQDLLD

==> /home/shared/8TB_HDD_03/sr320/github/clamgonads-macsamples/data/ncbi_dataset/data/GCF_026571515.1/rna.fna <==
>XM_060695081.1 PREDICTED: Ruditapes philippinarum uncharacterized LOC132760673 (LOC132760673), mRNA
TAGCTTTATTTAGTGAAACATATAAAGATGTTTATATTTGTAAACTTATTATCATTGCTTACTCTAAGAGATTGGAGGAG
GAAGGGCCAGGATTTTCGGTTATATGAACTTTCCGGAAATTGTTAACCATTTCGGTTTTCTAGATGAGCCTCGCTTTGAA
AAGCGTCACTTTCGGACCACAGACTTAGTCATTTCAGAGAAGTTTGCTGTCACTTTGAATTGTTGAACAAAAATCAGTGC
ATAGATTGACATATTTTGAATTGTATGTGTTGAGTGATATTTTCTACATTGCCGGACGAGCTTCCACTATTTACAATGCC
AAAAAGAAAACCCTCCGCGTCTACCAGTGGTAAAGGCAAGGCAACAACATCATCCACTAGCGACGACCACAGGTTGGCGA
GCATCATCGCCAATGCTTTAGTGCAGAATAAGTCTGCACTGAAGGAAGTAGCCGCCTTATTACCACCAATGACGATAGAA
CCAGGCATCCGCCCCGACGGTACAGAGGCGGAATTAACAGAACCGGATCATGCCGTCCAAAAGCAGCCGCGGCTGGGTAA
CAGAGGCTTTGGAACTGGCTCCTCACCTAAACAGGGACCCTGTACCAATTCCACGAGAAATGTCACCATGGATATTGCTC
ACGTGAAGAAGTCCTTGTTGCATTCTTCTCTGGCTCCGGGAACCCATAAAGCTTACGATCGGTTTTGGGAACGGTTTCTA

==> /home/shared/8TB_HDD_03/sr320/github/clamgonads-macsamples/data/ncbi_dataset/data/GCF_026571515.1/sequence_report.jsonl <==
{"assemblyAccession":"GCF_026571515.1","assemblyUnit":"Primary Assembly","assignedMoleculeLocationType":"Chromosome","chrName":"Un","genbankAccession":"JAKTTH010000001.1","length":38808,"refseqAccession":"NW_026851514.1","role":"unplaced-scaffold","sequenceName":"ctg11145_1_1_1_1"}
{"assemblyAccession":"GCF_026571515.1","assemblyUnit":"Primary Assembly","assignedMoleculeLocationType":"Chromosome","chrName":"Un","genbankAccession":"JAKTTH010000002.1","length":239892,"refseqAccession":"NW_026851515.1","role":"unplaced-scaffold","sequenceName":"ctg3581_1_1_1_1"}
{"assemblyAccession":"GCF_026571515.1","assemblyUnit":"Primary Assembly","assignedMoleculeLocationType":"Chromosome","chrName":"Un","genbankAccession":"JAKTTH010000003.1","length":379583,"refseqAccession":"NW_026851516.1","role":"unplaced-scaffold","sequenceName":"ctg1911_1_1_1_1"}
{"assemblyAccession":"GCF_026571515.1","assemblyUnit":"Primary Assembly","assignedMoleculeLocationType":"Chromosome","chrName":"Un","genbankAccession":"JAKTTH010000004.1","length":285079,"refseqAccession":"NW_026851517.1","role":"unplaced-scaffold","sequenceName":"ctg1746_1_1_1_1"}
{"assemblyAccession":"GCF_026571515.1","assemblyUnit":"Primary Assembly","assignedMoleculeLocationType":"Chromosome","chrName":"Un","genbankAccession":"JAKTTH010000005.1","length":344284,"refseqAccession":"NW_026851518.1","role":"unplaced-scaffold","sequenceName":"ctg1952_1_1_1_1"}
{"assemblyAccession":"GCF_026571515.1","assemblyUnit":"Primary Assembly","assignedMoleculeLocationType":"Chromosome","chrName":"Un","genbankAccession":"JAKTTH010000006.1","length":15851,"refseqAccession":"NW_026851519.1","role":"unplaced-scaffold","sequenceName":"ctg18844_1_1_1_1"}
{"assemblyAccession":"GCF_026571515.1","assemblyUnit":"Primary Assembly","assignedMoleculeLocationType":"Chromosome","chrName":"Un","genbankAccession":"JAKTTH010000007.1","length":57002,"refseqAccession":"NW_026851520.1","role":"unplaced-scaffold","sequenceName":"ctg7485_1_1_1_1"}
{"assemblyAccession":"GCF_026571515.1","assemblyUnit":"Primary Assembly","assignedMoleculeLocationType":"Chromosome","chrName":"Un","genbankAccession":"JAKTTH010000008.1","length":134014,"refseqAccession":"NW_026851521.1","role":"unplaced-scaffold","sequenceName":"ctg10092_1_1_1_1"}
{"assemblyAccession":"GCF_026571515.1","assemblyUnit":"Primary Assembly","assignedMoleculeLocationType":"Chromosome","chrName":"Un","genbankAccession":"JAKTTH010000009.1","length":35506,"refseqAccession":"NW_026851522.1","role":"unplaced-scaffold","sequenceName":"ctg9815_1_1_1_1"}
{"assemblyAccession":"GCF_026571515.1","assemblyUnit":"Primary Assembly","assignedMoleculeLocationType":"Chromosome","chrName":"Un","genbankAccession":"JAKTTH010000010.1","length":49325,"refseqAccession":"NW_026851523.1","role":"unplaced-scaffold","sequenceName":"ctg6631_1_1_1_1"}