--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left layout: post title: Data Wrangling - Pgenerosa_v074.a3 Annotation Genome Repeats Compostion date: '2019-09-05 15:30' tags: - Panopea generosa - geoduck - repeats - Jupyter - Pgenerosa_v074.a3 categories: - 2019 - Geoduck Genome Sequencing --- Needed to pull some numbers on repeats in our Pgenerosa_v074.a3 GenSAS annotation ([from 20190710](https://robertslab.github.io/sams-notebook/posts/2019/2019-07-10-Genome-Annotation---Pgenerosa_v074-Using-GenSAS/)) for the manuscript we're putting together. Used the following file (repeats identified via RepeatMasker; original file name t_5d25021b0d20b-5d250896def4c-repeatmasker.gensas.gff3): [Panopea-generosa-vv0.74.a3.TE.gff3](https://owl.fish.washington.edu/halfshell/genomic-databank/Panopea-generosa-vv0.74.a3.TE.gff3) Repeats were analyzed by type of repeats: - DNA transposons (DNA) - Simple repeats - LINE (Long Interspersed Nuclear Elements) - LTR (Long Terminal Repeats) - SINE (Short Interspersed Nuclear Elements) - Unknown repeats Analysis was done in a Jupyter notebook (link below). Jupyter Notebook (GitHub): - [20190905_swoose_pgen_v074.a3_genome_repeats_counts.ipynb](https://github.com/RobertsLab/code/blob/master/notebooks/sam/20190905_swoose_pgen_v074.a3_genome_repeats_counts.ipynb) --- # RESULTS Will add info to manuscript and create a table. ``` Panopea-generosa-vv0.74.a3.repeats.DNA.gff3 ------------------------- percent 1.2 sum 11316890.00 mean 293.09 min 1.00 median 154.00 max 7012.00 Panopea-generosa-vv0.74.a3.repeats.Simple_repeat.gff3 ------------------------- percent 2.03 sum 19132744.00 mean 63.78 min 1.00 median 41.00 max 12422.00 Panopea-generosa-vv0.74.a3.repeats.LINE.gff3 ------------------------- percent 3.11 sum 29258694.00 mean 396.48 min 11.00 median 227.00 max 6604.00 Panopea-generosa-vv0.74.a3.repeats.LTR.gff3 ------------------------- percent 0.46 sum 4355629.00 mean 370.63 min 1.00 median 276.00 max 6541.00 Panopea-generosa-vv0.74.a3.repeats.SINE.gff3 ------------------------- percent 2.3 sum 21645991.00 mean 147.84 min 1.00 median 142.00 max 934.00 Panopea-generosa-vv0.74.a3.repeats.Unknown.gff3 ------------------------- percent 31.14 sum 2.933161e+08 mean 2.001500e+02 min 1.100000e+01 median 1.450000e+02 max 1.098100e+04 ------------------------- Repeats composition of genome (percent): 40.24 ```