--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left layout: post title: Transposable Element Mapping - Crassostrea gigas Genome v9 Using RepeatMasker 4.07 on Roadrunner date: '2019-03-27 09:08' tags: - TE - transposable elements - roadrunner - RepeatMasker - jupyter notebook - Crasssostrea gigas - Pacific oyster categories: - 2019 - Miscellaneous --- Per this [GitHub issue](https://github.com/RobertsLab/resources/issues/643), I'm IDing transposable elements (TEs) in the _Crassostrea gigas_ genome. Even though the _C.gigas_ genome should be fully annotated, Steven wants a comparable set of analyses to compare to [the _Crassostrea virginica_ TE mapping](https://robertslab.github.io/sams-notebook/posts/2018/2018-08-28-transposable-element-mapping-crassostrea-virginica-genome-cvirginica_v300-using-repeatmasker-4-07/) we previously performed. I used the _Crassostrea gigas_ genome we have linked on our [GitHub Genomic Resources wiki](https://github.com/RobertsLab/resources/wiki/Genomic-Resources): - [Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa](http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa) Analysis was performed in the following Jupyter Notebok (GitHub): - [20190327_roadrunner_cgig_TEs_repeatmasker.ipynb](https://github.com/RobertsLab/code/blob/master/notebooks/sam/20190327_roadrunner_cgig_TEs_repeatmasker.ipynb) --- # RESULTS This took ~24hrs to complete. Output folder: - [20190327_cgig_repeatmasker_all/](http://gannet.fish.washington.edu/Atumefaciens/20190327_cgig_repeatmasker_all) Genome used (from our [Genomic Resources wiki](https://github.com/RobertsLab/resources/wiki/Genomic-Resources#genome)): - [Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa](http://owl.fish.washington.edu/halfshell/genomic-databank/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa) GFF file: - [20190327_cgig_repeatmasker_all/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa.out.gff](http://gannet.fish.washington.edu/Atumefaciens/20190327_cgig_repeatmasker_all/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa.out.gff) Summary table (text): - [20190327_cgig_repeatmasker_all/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa.tbl](http://gannet.fish.washington.edu/Atumefaciens/20190327_cgig_repeatmasker_all/Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa.tbl)
==================================================
file name: Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa
sequences: 7658
total length: 557717710 bp (491860439 bp excl N/X-runs)
GC level: 33.42 %
bases masked: 160369613 bp ( 32.60 %)
==================================================
number of length percentage
elements* occupied of sequence
--------------------------------------------------
Retroelements 48481 19773596 bp 4.02 %
SINEs: 2498 317084 bp 0.06 %
Penelope 5749 1808270 bp 0.37 %
LINEs: 26463 10472676 bp 2.13 %
CRE/SLACS 15 1289 bp 0.00 %
L2/CR1/Rex 1712 307207 bp 0.06 %
R1/LOA/Jockey 299 21470 bp 0.00 %
R2/R4/NeSL 218 69735 bp 0.01 %
RTE/Bov-B 8417 3631379 bp 0.74 %
L1/CIN4 983 64189 bp 0.01 %
LTR elements: 19520 8983836 bp 1.83 %
BEL/Pao 2050 1349545 bp 0.27 %
Ty1/Copia 2139 189535 bp 0.04 %
Gypsy/DIRS1 11971 6501545 bp 1.32 %
Retroviral 1263 69288 bp 0.01 %
DNA transposons 299050 85782505 bp 17.44 %
hobo-Activator 9348 2278556 bp 0.46 %
Tc1-IS630-Pogo 32515 8695261 bp 1.77 %
En-Spm 0 0 bp 0.00 %
MuDR-IS905 0 0 bp 0.00 %
PiggyBac 4136 747000 bp 0.15 %
Tourist/Harbinger 11590 2828277 bp 0.58 %
Other (Mirage, 232 14514 bp 0.00 %
P-element, Transib)
Rolling-circles 0 0 bp 0.00 %
Unclassified: 109149 49075277 bp 9.98 %
Total interspersed repeats: 154631378 bp 31.44 %
Small RNA: 830 93282 bp 0.02 %
Satellites: 2087 401812 bp 0.08 %
Simple repeats: 110847 4687373 bp 0.95 %
Low complexity: 16716 787611 bp 0.16 %
==================================================
* most repeats fragmented by insertions or deletions
have been counted as one element
Runs of >=20 X/Ns in query were excluded in % calcs
The query species was assumed to be root
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
run with rmblastn version 2.6.0+
I've put together the TE comparison requested in the [GitHub Issue mentioned above](https://github.com/RobertsLab/resources/issues/643) in a Google Sheet:
- [20190327_te_comparison_cgig_cvir](https://docs.google.com/spreadsheets/d/1Or-zrbFAq2xl4iDNIJPJX0I9DWLtpa3m7gqDrFw-NT8/edit?usp=sharing)