--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left date: 2016-10-25 21:20:50+00:00 layout: post slug: data-management-geoduck-small-insert-library-genome-assembly-from-bgi title: Data Management – Geoduck Small Insert Library Genome Assembly from BGI categories: - 2016 - Geoduck Genome Sequencing tags: - BGI - geoduck - jupyter notebook - Panopea generosa --- Received another set of _Panopea generosa_ genome assembly data from BGI back in May! I neglected to create MD5 checksums, as well as a readme file for this data set! Of course, I needed some of the info that the readme file should've had and it wasn't there. So, here's the skinny... It's data assembled from the small insert libraries they created for this project. All data is stored here: [https://owl.fish.washington.edu/P_generosa_genome_assemblies_BGI/20160512/](http://owl.fish.washington.edu/P_generosa_genome_assemblies_BGI/20160512/) They've provided a [Genome Survey (PDF)(https://owl.fish.washington.edu/P_generosa_genome_assemblies_BGI/20160512/20160512_F15FTSUSAT0328_genome_survey.pdf) that has some info about the data they've assembled. In it, is the estimated genome size: Geoduck genome size: 2972.9 Mb Additionally, there's a table breaking down the N50 distributions of scaffold and contig sizes. Data management stuff was performed in a Jupyter (iPython) notebook; see below. Jupyter Notebook: [20161025_Pgenerosa_Small_Library_Genome_Read_Counts.ipynb](https://github.com/sr320/LabDocs/blob/master/jupyter_nbs/sam/20161025_Pgenerosa_Small_Library_Genome_Read_Counts.ipynb)