---
author: Sam White
toc-title: Contents
toc-depth: 5
toc-location: left
date: 2017-10-09 17:41:41+00:00
layout: post
slug: data-management-convert-oly-pacbio-h5-to-fastq
title: Data Management - Convert Oly PacBio H5 to FASTQ
categories:
  - 2017
  - Olympia Oyster Genome Sequencing
tags:
  - FASTQ
  - olympia oyster
  - Ostrea lurida
  - PacBio
---

After working with all of this Olympia oyster genome sequencing data, I remembered that we had an old, singular PacBio SMRT cell file (from June 2013). This file didn't seem to be included in any recent assemblies of Sean's or mine. This is most likely because we have it in the PacBio H5 format and not in FASTQ.

I installed [PacBio's pbh5tools](https://github.com/PacificBiosciences/pbh5tools) on my computer (swoose), converted the file and moved it to owl/nightingales/O_lurida


    <code>python bash5tools.py /mnt/owl/nightingales/O_lurida/m130619_081336_42134_c100525122550000001823081109281326_s1_p0.bas.h5 --outType fastq </code>


I generated an MD5 checksum and appended to the `checksums.md5` file in /owl/nightingales/O_lurida using the following command:


    <code>md5sum m130619_081336_42134_c100525122550000001823081109281326_s1_p0.fastq | awk '{print $2 " = " $1}' >> checksums.md5</code>


The command above pipes the output to awk to format the output to match the existing format of the `checksums.md5` file (i.e. filename = hash).

I've also updated our [Nightingales spreadsheet (Google Sheet)(https://docs.google.com/spreadsheets/d/1_XqIOPVHSBVGscnjzDSWUeRL7HUHXfaHxVzec-I-8Xk/edit?usp=sharing) to reflect this.

Will generate updated PacBio assemblies with Canu and/or Racon.