--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left layout: post title: Transposable Element Mapping - Crassostrea gigas Genome v9 Using RepeatMasker 4.07 on Roadrunner date: '2019-03-27 09:08' tags: - TE - transposable elements - roadrunner - RepeatMasker - jupyter notebook - Crasssostrea gigas - Pacific oyster categories: - 2019 - Miscellaneous --- Per this [GitHub issue](https://github.com/RobertsLab/resources/issues/643), I'm IDing transposable elements (TEs) in the _Crassostrea gigas_ genome. Even though the _C.gigas_ genome should be fully annotated, Steven wants a comparable set of analyses to compare to [the _Crassostrea virginica_ TE mapping](https://robertslab.github.io/sams-notebook/posts/2018/2018-08-28-transposable-element-mapping-crassostrea-virginica-genome-cvirginica_v300-using-repeatmasker-4-07/) we previously performed. I used the _Crassostrea gigas_ genome we have linked on our [GitHub Genomic Resources wiki](https://github.com/RobertsLab/resources/wiki/Genomic-Resources): - [Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa](http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa) Analysis was performed in the following Jupyter Notebok (GitHub): - [20190327_roadrunner_cgig_TEs_repeatmasker.ipynb](https://github.com/RobertsLab/code/blob/master/notebooks/sam/20190327_roadrunner_cgig_TEs_repeatmasker.ipynb) --- # RESULTS This took ~24hrs to complete. Output folder: - [20190327_cgig_repeatmasker_all/](http://gannet.fish.washington.edu/Atumefaciens/20190327_cgig_repeatmasker_all) Genome used (from our [Genomic Resources wiki](https://github.com/RobertsLab/resources/wiki/Genomic-Resources#genome)): - [Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa](http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa) GFF file: - [20190327_cgig_repeatmasker_all/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa.out.gff](http://gannet.fish.washington.edu/Atumefaciens/20190327_cgig_repeatmasker_all/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa.out.gff) Summary table (text): - [20190327_cgig_repeatmasker_all/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa.tbl](http://gannet.fish.washington.edu/Atumefaciens/20190327_cgig_repeatmasker_all/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa.tbl)


==================================================
file name: Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa
sequences:          7658
total length:  557717710 bp  (491860439 bp excl N/X-runs)
GC level:         33.42 %
bases masked:  160369613 bp ( 32.60 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
Retroelements        48481     19773596 bp    4.02 %
   SINEs:             2498       317084 bp    0.06 %
   Penelope           5749      1808270 bp    0.37 %
   LINEs:            26463     10472676 bp    2.13 %
    CRE/SLACS           15         1289 bp    0.00 %
     L2/CR1/Rex       1712       307207 bp    0.06 %
     R1/LOA/Jockey     299        21470 bp    0.00 %
     R2/R4/NeSL        218        69735 bp    0.01 %
     RTE/Bov-B        8417      3631379 bp    0.74 %
     L1/CIN4           983        64189 bp    0.01 %
   LTR elements:     19520      8983836 bp    1.83 %
     BEL/Pao          2050      1349545 bp    0.27 %
     Ty1/Copia        2139       189535 bp    0.04 %
     Gypsy/DIRS1     11971      6501545 bp    1.32 %
       Retroviral     1263        69288 bp    0.01 %

DNA transposons     299050     85782505 bp   17.44 %
   hobo-Activator     9348      2278556 bp    0.46 %
   Tc1-IS630-Pogo    32515      8695261 bp    1.77 %
   En-Spm                0            0 bp    0.00 %
   MuDR-IS905            0            0 bp    0.00 %
   PiggyBac           4136       747000 bp    0.15 %
   Tourist/Harbinger 11590      2828277 bp    0.58 %
   Other (Mirage,      232        14514 bp    0.00 %
    P-element, Transib)

Rolling-circles          0            0 bp    0.00 %

Unclassified:       109149     49075277 bp    9.98 %

Total interspersed repeats:   154631378 bp   31.44 %


Small RNA:             830        93282 bp    0.02 %

Satellites:           2087       401812 bp    0.08 %
Simple repeats:     110847      4687373 bp    0.95 %
Low complexity:      16716       787611 bp    0.16 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element
  Runs of >=20 X/Ns in query were excluded in % calcs


The query species was assumed to be root          
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127

run with rmblastn version 2.6.0+

I've put together the TE comparison requested in the [GitHub Issue mentioned above](https://github.com/RobertsLab/resources/issues/643) in a Google Sheet: - [20190327_te_comparison_cgig_cvir](https://docs.google.com/spreadsheets/d/1Or-zrbFAq2xl4iDNIJPJX0I9DWLtpa3m7gqDrFw-NT8/edit?usp=sharing)