--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left date: 2018-05-09 19:13:53+00:00 layout: post slug: read-mapping-mapping-illumina-data-to-geoduck-genome-assemblies-with-bowtie2 title: Read Mapping - Mapping Illumina Data to Geoduck Genome Assemblies with Bowtie2 categories: - 2018 - Geoduck Genome Sequencing tags: - assembly - bowtie2 - geoduck - jupyter notebook - Panopea generosa --- We have an upcoming meeting with Illumina to discuss how the geoduck genome project is coming along and to decide how we want to proceed. So, we wanted to get a quick idea of how well our [geoduck assemblies](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) are by performing some quick alignments using [Bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml). Used the following assemblies as references: * [sn_ph_01](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) : SuperNova assembly of 10x Genomics data * [sparse_03](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) : SparseAssembler assembly of BGI and Illumina project data * [pga_02](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) : Hi-C assembly of Phase Genomics data The analysis is documented in a Jupyter Notebook. Jupyter Notebook (GitHub): * [20180508_roadrunner_geoduck_bowtie2_genome_mapping.ipynb](https://github.com/sr320/LabDocs/blob/master/jupyter_nbs/sam/20180508_roadrunner_geoduck_bowtie2_genome_mapping.ipynb) _NOTE: Due to large amount of stdout from first genome index command, the notebook does not render well on GitHub. I recommend downloading and opening notebook on a locally install version of Jupyter._ Here's a brief overview of the process: 1. Generate Bowtie2 indexes for each of the genome assemblies. 2. Map 1,000,000 reads from the following Illumina NovaSeq FastQ files: * [NR013_AD013_S2_L001_R1_001_val_1_val_1.fq.gz](https://owl.fish.washington.edu/Athaliana/20180129_trimgalore_geoduck_novaseq/NR013_AD013_S2_L001_R1_001_val_1_val_1.fq.gz) * [NR013_AD013_S2_L001_R2_001_val_2_val_2.fq.gz](https://owl.fish.washington.edu/Athaliana/20180129_trimgalore_geoduck_novaseq/NR013_AD013_S2_L001_R2_001_val_2_val_2.fq.gz) * * * ##### Results: Bowtie2 Genome Indexes: * [20180508_geoduck_assemblies_bowtie2_indexes/](https://owl.fish.washington.edu/Athaliana/20180508_geoduck_assemblies_bowtie2_indexes/) Bowtie2 [sn_ph_01](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) alignment folder: * [20180508_geoduck_mapping_nova_to_10x/](https://owl.fish.washington.edu/Athaliana/20180508_geoduck_mapping_nova_to_10x/) Bowtie2 [sparse_03](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) alignment folder: * [20180508_geoduck_mapping_nova_to_sparse/](https://owl.fish.washington.edu/Athaliana/20180508_geoduck_mapping_nova_to_sparse/) Bowtie2 [pga_02](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) alignment folder: * [20180508_geoduck_mapping_nova_to_Hi-C/](https://owl.fish.washington.edu/Athaliana/20180508_geoduck_mapping_nova_to_Hi-C/) * * * ##### MAPPING SUMMARY TABLE _All mapping data was pulled from the respective *.err file in the Bowtie2 alignment folders._ sequence_ID Assembler Alignment Rate (%)
[sn_ph_01](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) SuperNova (10x) 79.89
[sparse_03](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) SparseAssembler 85.83
[pga_02](https://github.com/RobertsLab/project-geoduck-genome/wiki/Assemblies) Hi-C (Phase Genomics) 79.90|
Mapping efficiency is similar for all assemblies. After speaking with Steven, we've decided we'll begin exploring genome annotation pipelines.