--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left layout: post title: Data Wrangling - Pgenerosa_v074.a3 Annotation Genome Feature Sequence Lengths date: '2019-09-04 15:15' tags: - Panopea generosa - geoduck - Pgenerosa_v074.a3 - Panopea-generosa-vv0.74.a3 - Jupyter categories: - 2019 - Geoduck Genome Sequencing --- The [GenSAS Pgenerosa_v074 annotation from 20190710](https://robertslab.github.io/sams-notebook/posts/2019/2019-07-10-Genome-Annotation---Pgenerosa_v074-Using-GenSAS/) (referred to as: Panopea-generosa-vv0.74.a3) recently completed (after nearly a month of running). In preparation for a paper we're writing, we needed some summary stats for Panopea-generosa-vv0.74.a3. This info will be compiled in to a table for the manuscript. See our Genomic Resources wiki for more info on GFFs: - [Genomic Resources Wiki](https://github.com/RobertsLab/resources/wiki/Genomic-Resources) (GitHub) Calculations were performed using Python in a Jupyter Notebook. Jupyter Notebook (GitHub): - [20190904_swoose_pgen_v074.a3_genome_feature_counts.ipynb](https://github.com/RobertsLab/code/blob/master/notebooks/sam/20190904_swoose_pgen_v074.a3_genome_feature_counts.ipynb) --- # RESULTS I've copied/pasted the summary data for each of the GFFs that were analyzed, for quick reference. Will get this compiled in to a table of some sort for people to use for the manuscript. ``` Panopea-generosa-vv0.74.a3.exon.gff3 ------------------------- mean 255.932825 min 3.000000 median 157.000000 max 13359.000000 Panopea-generosa-vv0.74.a3.CDS.gff3 ------------------------- mean 255.932825 min 3.000000 median 157.000000 max 13359.000000 Panopea-generosa-vv0.74.a3.mRNA.gff3 ------------------------- mean 13318.053183 min 201.000000 median 2346.000000 max 345225.000000 Panopea-generosa-vv0.74.a3.gene.gff3 ------------------------- mean 13318.053183 min 201.000000 median 2346.000000 max 345225.000000 ```