---
author: Sam White
toc-title: Contents
toc-depth: 5
toc-location: left
date: 2018-05-09 19:13:53+00:00
layout: post
slug: read-mapping-mapping-illumina-data-to-geoduck-genome-assemblies-with-bowtie2
title: Read Mapping - Mapping Illumina Data to Geoduck Genome Assemblies with Bowtie2
categories:
  - 2018
  - Geoduck Genome Sequencing
tags:
  - assembly
  - bowtie2
  - geoduck
  - jupyter notebook
  - Panopea generosa
---

We have an upcoming meeting with Illumina to discuss how the geoduck genome project is coming along and to decide how we want to proceed.

So, we wanted to get a quick idea of how well our [geoduck assemblies](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) are by performing some quick alignments using [Bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml).

Used the following assemblies as references:


  * [sn_ph_01](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) : SuperNova assembly of 10x Genomics data


  * [sparse_03](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) : SparseAssembler assembly of BGI and Illumina project data


  * [pga_02](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) : Hi-C assembly of Phase Genomics data


The analysis is documented in a Jupyter Notebook.

Jupyter Notebook (GitHub):


  * [20180508_roadrunner_geoduck_bowtie2_genome_mapping.ipynb](https://github.com/sr320/LabDocs/blob/master/jupyter_nbs/sam/20180508_roadrunner_geoduck_bowtie2_genome_mapping.ipynb)


_NOTE: Due to large amount of stdout from first genome index command, the notebook does not render well on GitHub. I recommend downloading and opening notebook on a locally install version of Jupyter._

Here's a brief overview of the process:


  1. Generate Bowtie2 indexes for each of the genome assemblies.


  2. Map 1,000,000 reads from the following Illumina NovaSeq FastQ files:


  * [NR013_AD013_S2_L001_R1_001_val_1_val_1.fq.gz](https://owl.fish.washington.edu/Athaliana/20180129_trimgalore_geoduck_novaseq/NR013_AD013_S2_L001_R1_001_val_1_val_1.fq.gz)


  * [NR013_AD013_S2_L001_R2_001_val_2_val_2.fq.gz](https://owl.fish.washington.edu/Athaliana/20180129_trimgalore_geoduck_novaseq/NR013_AD013_S2_L001_R2_001_val_2_val_2.fq.gz)


* * *


##### Results:


Bowtie2 Genome Indexes:


  * [20180508_geoduck_assemblies_bowtie2_indexes/](https://owl.fish.washington.edu/Athaliana/20180508_geoduck_assemblies_bowtie2_indexes/)


Bowtie2 [sn_ph_01](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) alignment folder:


  * [20180508_geoduck_mapping_nova_to_10x/](https://owl.fish.washington.edu/Athaliana/20180508_geoduck_mapping_nova_to_10x/)


Bowtie2 [sparse_03](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) alignment folder:


  * [20180508_geoduck_mapping_nova_to_sparse/](https://owl.fish.washington.edu/Athaliana/20180508_geoduck_mapping_nova_to_sparse/)


Bowtie2 [pga_02](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) alignment folder:


  * [20180508_geoduck_mapping_nova_to_Hi-C/](https://owl.fish.washington.edu/Athaliana/20180508_geoduck_mapping_nova_to_Hi-C/)


* * *


##### MAPPING SUMMARY TABLE


_All mapping data was pulled from the respective *.err file in the Bowtie2 alignment folders._

<table >

<tr >
  sequence_ID
  Assembler
  Alignment Rate (%)
</tr>

<tbody >
<tr >
  
<td >[sn_ph_01](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies)
</td>
  
<td >SuperNova (10x)
</td>
  
<td >79.89
</td>
</tr>
<tr >
  
<td >[sparse_03](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies)
</td>
  
<td >SparseAssembler
</td>
  
<td >85.83
</td>
</tr>
<tr >
  
<td >[pga_02](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies)
</td>
  
<td >Hi-C (Phase Genomics)
</td>
  
<td >79.90|
</td>
</tr>
</tbody>
</table>

Mapping efficiency is similar for all assemblies. After speaking with Steven, we've decided we'll begin exploring genome annotation pipelines.
[sn_ph_01](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies)	SuperNova (10x)	79.89
[sparse_03](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies)	SparseAssembler	85.83
[pga_02](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies)	Hi-C (Phase Genomics)	79.90\|