---
author: Sam White
toc-title: Contents
toc-depth: 5
toc-location: left
date: 2018-09-24 23:23:14+00:00
layout: post
slug: bedgraph-olympia-oyster-transcriptome-fail
title: Bedgraph - Olympia oyster transcriptome (FAIL)
categories:
- 2018
- Miscellaneous
tags:
- alignment
- bedgraph
- bedtools
- mox
- olympia oyster
- Ostrea lurida
- samtools
- transcriptome
---
Progress on generating [bedgraphs from our Olympia oyster transcriptome](https://github.com/RobertsLab/resources/issues/356) continues.
[Transcriptome assembly with Trinity completed 20180919](https://robertslab.github.io/sams-notebook/posts/2018/2018-09-19-transcriptome-assembly-olympia-oyster-rnaseq-data-with-trinity/).
Then, [aligned the assembled transcriptome to our genome using Bowtie2](https://robertslab.github.io/sams-notebook/posts/2018/2018-09-20-transcriptome-alignment-olympia-oyster-trinity-transcriptome-aligned-to-genome-with-bowtie2/).
Finally, I used BEDTools to convert the BAM to BED to bedgraph.
This required an initial indexing of our Olympia oyster genome FastA using samtools faidx tool.
SBATCH script file:
* [20180924_oly_RNAseq_bedgraphs.sh](https://owl.fish.washington.edu/Athaliana/20180924_oly_RNAseq_bedgraphs/20180924_oly_RNAseq_bedgraphs.sh)
#!/bin/bash
## Job Name
#SBATCH --job-name=20180924_oly_bedgraphs
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=5-00:00:00
## Memory per node
#SBATCH --mem=500G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/scrubbed/samwhite/20180924_oly_RNAseq_bedgraphs
# Load Python Mox module for Python module availability
module load intel-python3_2017
# Document programs in PATH (primarily for program version ID)
date >> system_path.log
echo "" >> system_path.log
echo "System PATH for $SLURM_JOB_ID" >> system_path.log
echo "" >> system_path.log
printf "%0.s-" {1..10} >> system_path.log
echo ${PATH} | tr : \\n >> system_path.log
# Set genome assembly FastA
oly_genome_fasta=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/Olurida_v081.fa
# Set indexed genome assembly file
oly_genome_indexed=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/Olurida_v081.fa.fai
# Set sorted transcriptome assembly bam file
oly_transcriptome=/gscratch/scrubbed/samwhite/20180919_oly_transcriptome_bowtie2/20180919_Olurida_v081.sorted.bam
# Set program paths
bedtools=/gscratch/srlab/programs/bedtools-2.27.1/bin
samtools=/gscratch/srlab/programs/samtools-1.9/samtools
# Index genome FastA
${samtools} faidx ${oly_genome_fasta}
# Format indexed genome for bedtools
## Requires only two columns: namelength
awk -v OFS='\t' {'print $1,$2'} ${oly_genome_indexed} > Olurida_v081.fa.fai.genome
# Create bed file
${bedtools}/bamToBed \
-i ${oly_transcriptome} \
> 20180924_oly_RNAseq.bam.bed
# Create bedgraph
## Reports depth at each position (-bg in bedgraph format) and report regions with zero coverage (-a).
## Screens for portions of reads coming from exons (-split).
## Add genome browser track line to header of bedgraph file.
${bedtools}/genomeCoverageBed \
-i ${PWD}/20180924_oly_RNAseq.bed \
-g Olurida_v081.fa.fai.genome \
-bga \
-split \
-trackline \
> 20180924_oly_RNAseq.bed
Alignment was done using the following version of the Olympia oyster genome assembly:
* [Olurida_v081.fa](https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.fa)
* * *
# RESULTS:
Output folder:
* [20180924_oly_RNAseq_bedgraphs/](https://owl.fish.washington.edu/Athaliana/20180924_oly_RNAseq_bedgraphs/)
Indexed and formatted genome file:
* [Olurida_v081.fa.fai.genome](https://owl.fish.washington.edu/Athaliana/20180924_oly_RNAseq_bedgraphs/Olurida_v081.fa.fai.genome)
Bedgraph file (for IGV):
* [20180924_oly_RNAseq.bed](https://owl.fish.washington.edu/Athaliana/20180924_oly_RNAseq_bedgraphs/20180924_oly_RNAseq.bed)
* * *
This doesn't appear to have worked properly. Here's a view of the bedgraph file:
track type=bedGraph
Contig0 0 116746 0
Contig1 0 87411 0
Contig2 0 139250 0
Contig3 0 141657 0
Contig4 0 95692 0
Contig5 0 130522 0
Contig6 0 94893 0
Contig7 0 109667 0
Contig8 0 95943 0
I'd expect multiple entries for each contig (ideally), indicating start/stop positions for where transcripts align within a given contig. However, this appears to simply be a list of all the genome contigs and their lengths (Start=0, Stop=n).
I would expect to see something li
I'll look into this further and see where this pipeline goes wrong.