--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left date: 2018-09-27 00:17:57+00:00 layout: post slug: transcriptome-alignment-bedgraph-olympia-oyster-transcriptome-with-olurida_v080-genome-assembly title: Transcriptome Alignment & Bedgraph – Olympia oyster transcriptome with Olurida_v080 genome assembly categories: - 2018 - Miscellaneous tags: - alignment - bedgraph - bedtools - hisat2 - mox - Olurida_v080 - olympia oyster - Ostrea lurida - samtools --- [Yesterday, I produced a bedgraph file of our Olympia oyster RNAseq data coverage using our Olurida_v081 genome.](https://robertslab.github.io/sams-notebook/posts/2018/2018-09-26-bedgraph-olympia-oyster-transcriptome-with-olurida_v081-genome-assembly/) I decided that I wanted to use the Olurida_v080 version instead (or, in addtion to?), as the Olurida_v080 version has not been size restricted (the Olurida v081 version is only contigs >1000bp). I feel like we could miss some important regions, so wanted to run this analysis using all of the genome data we currently have available. Additionally, this will be consistent with [my previous Bismark (DNA methylation analysis)(2018/09/13/dna-methylation-analysis-olympia-oyster-whole-genome-bsseq-bismark-pipeline-comparison.html). Used HISAT2 on our HPC Mox node to align our RNAseq reads to our Olurida_v080 genome assembly: * [Olurida_v080.fa](https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v080.fa) SBATCH script file: * [20180926_oly_RNAseq_bedgraphs.sh](https://owl.fish.washington.edu/Athaliana/20180926_oly_RNAseq_genome_hisat2_bedgraph.sh) NOTE: For brevity sake, I have _not_ listed all of the input RNAseq files below. Please see the full script, which is linked above. #!/bin/bash ## Job Name #SBATCH --job-name=20180926_oly_hisat2 ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=5-00:00:00 ## Memory per node #SBATCH --mem=500G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/20180926_oly_RNAseq_genome_hisat2_bedgraph # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo ${PATH} | tr : \\n >> system_path.log # Set genome assembly path oly_genome_path=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies # Set sorted transcriptome assembly bam file oly_transcriptome_bam=20180926_Olurida_v080.sorted.bam # Set hisat2 basename hisat2_basename=Olurida_v080 # Set program paths ## hisat2 hisat2=/gscratch/srlab/programs/hisat2-2.1.0 ## bedtools bedtools=/gscratch/srlab/programs/bedtools-2.27.1/bin ## samtools stools=/gscratch/srlab/programs/samtools-1.9/samtools # Build hisat2 genome index ${hisat2}/hisat2-build \ -f ${oly_genome_path}/Olurida_v080.fa \ Olurida_v080 \ -p 28 # Align reads to oly genome assembly ${hisat2}/hisat2 \ --threads 28 \ -x "${hisat2_basename}" \ -q \ -1 \ -2 \ -S 20180926_"${hisat2_basename}".sam # Convert SAM file to BAM "${stools}" view \ --threads 28 \ -b 20180926_"${hisat2_basename}".sam > 20180926_"${hisat2_basename}".bam # Sort BAM "${stools}" sort \ --threads 28 \ 20180926_"${hisat2_basename}".bam \ -o 20180926_"${hisat2_basename}".sorted.bam # Index for use in IGV ##-@ specifies thread count; --thread option not available in samtools index "${stools}" index \ -@ 28 \ 20180926_"${hisat2_basename}".sorted.bam # Create bedgraph ## Reports depth at each position (-bg in bedgraph format) and report regions with zero coverage (-a). ## Screens for portions of reads coming from exons (-split). ## Add genome browser track line to header of bedgraph file. ${bedtools}/genomeCoverageBed \ -ibam ${oly_transcriptome_bam} \ -bga \ -split \ -trackline \ > 20180926_oly_RNAseq.bedgraph The script performs the following functions: * Genome indexing * RNAseq alignment to genome * Convert SAM to BAM * Sort and index BAM * Determine RNAseq coverage * * * # RESULTS Output folder: * [20180926_oly_RNAseq_genome_hisat2_bedgraph/](https://owl.fish.washington.edu/Athaliana/20180926_oly_RNAseq_genome_hisat2_bedgraph) Bedgraph file (1.9GB): * [20180926_oly_RNAseq_genome_hisat2_bedgraph/20180926_oly_RNAseq.bedgraph](https://owl.fish.washington.edu/Athaliana/20180926_oly_RNAseq_genome_hisat2_bedgraph/20180926_oly_RNAseq.bedgraph) Loaded in to IGV to verify things looked OK: ![](https://owl.fish.washington.edu/Athaliana/20180926_oly_RNAseq_coverage_Oluridav080_IGV.png)