--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left layout: post title: Genome Annotation - Pgenerosa_v070 and v074 Top 18 Scaffolds Feature Count Comparisons date: '2019-07-22 11:03' tags: - Panopea generosa - geoduck - Pgenerosa_v070 - Pgenerosa_v074 - v070 - v074 - gff categories: - 2019 - Geoduck Genome Sequencing --- After [annotating Pgenerosa_v074 on 20190701](https://robertslab.github.io/sams-notebook/posts/2019/2019-07-01-Genome-Annotation---Pgenerosa_v074-MAKER-on-Mox/), we noticed a large discrepancy in the number of transcripts that MAKER identified, compared to Pgenerosa_v070. As a reminder, the Pgenerosa_v074 is a subset of Pgenerosa_v070 containing only the top 18 longest scaffolds. So, we decided to do a quick comparison of the annotations present in these 18 scaffolds Pgenerosa_v070 and Pgenerosa_v074. Briefly, used `grep` to pull out features identified in the same 18 scaffolds in the Pgenerosa_v074 assembly from [Pgenerosa_v070 annotated GFF from 20190228](https://robertslab.github.io/sams-notebook/posts/2019/2019-02-28-Genome-Annotation---Pgenerosa_v070-MAKER-on-Mox/) and then counted the number of features identified in this newly subsetted GFF. It's all documented in the Jupyter Notebook below. Jupyter Notebook (GitHub): - [20190722_swoose_pgen_v070_top18_scaffold_annotation_count.ipynb](https://github.com/RobertsLab/code/blob/master/notebooks/sam/20190722_swoose_pgen_v070_top18_scaffold_annotation_count.ipynb) --- # RESULTS Output folder: - [20190722_pgen_v070_top18_scaffold_annotation_counts/](https://gannet.fish.washington.edu/Atumefaciens/20190722_pgen_v070_top18_scaffold_annotation_counts/) GFF subset: - [20190722_pgen_v070_top18_scaffold_annotation_counts/pgen_v070_top18_scaffolds.gff](https://gannet.fish.washington.edu/Atumefaciens/20190722_pgen_v070_top18_scaffold_annotation_counts/pgen_v070_top18_scaffolds.gff) #### SUMMARY TABLE | GENOME | CDS | exon | gene | mRNA | |---------------------------|--------|--------|-------|-------| | Pgen_v070 | 169460 | 175006 | 53035 | 53035 | | Pgen_v070_top18_scaffolds | 41187 | 43554 | 8404 | 8404 | | Pgen_v074 | 10764 | 10940 | 1712 | 1712 | --- Well, we definitely see a difference between annotations of those top 18 scaffolds (~6-fold difference). However, there is still a _huge_ difference between the full Pgenerosa_v070 annotation and the top 18 scaffolds from the Pgenerosa_v070 annotation. I'll be performing some RNAseq alignments to these various assemblies. That should provide us with some evidence that we can use to support/refute some of the annotations that are present/absent.