--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left date: 2016-12-15 22:46:19+00:00 layout: post slug: data-management-integrity-check-of-final-bgi-olympia-oyster-geoduck-data title: Data Management - Integrity Check of Final BGI Olympia Oyster & Geoduck Data categories: - 2016 - Geoduck Genome Sequencing - Olympia Oyster Genome Sequencing tags: - array - bash - BGI - checksum - geoduck - jupyter notebook - md5 - olympia oyster - Ostrea lurida - ostrich - Panopea generosa - parameter substitution --- After completing the downloads of these files from BGI, I needed to verify that the downloaded copies matched the originals. Below is a Jupyter Notebook detailing how I verified file integrity via MD5 checksums. It also highlights the importance of doing this check when working with large sequencing files (or, just large files in general), as a few of them had mis-matching MD5 checksums! Although the notebook is embedded below, it might be easier viewing via the notebook link (hosted on GitHub). At the end of the day, I had to re-download some files, but all the MD5 checksums match and these data are ready for analysis: [Final _Ostrea lurida_ genome files](https://owl.fish.washington.edu/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/) [Final _Panopea generosa_ genome files](https://owl.fish.washington.edu/P_generosa_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Panopea_generosa/) Jupyter Notebook: [20161214_docker_BGI_data_integrity_check.ipynb](https://github.com/sr320/LabDocs/blob/master/jupyter_nbs/sam/20161214_docker_BGI_data_integrity_check.ipynb)