---
author: Sam White
toc-title: Contents
toc-depth: 5
toc-location: left
date: 2018-09-24 23:23:14+00:00
layout: post
slug: bedgraph-olympia-oyster-transcriptome-fail
title: Bedgraph - Olympia oyster transcriptome (FAIL)
categories:
  - 2018
  - Miscellaneous
tags:
  - alignment
  - bedgraph
  - bedtools
  - mox
  - olympia oyster
  - Ostrea lurida
  - samtools
  - transcriptome
---

Progress on generating [bedgraphs from our Olympia oyster transcriptome](https://github.com/RobertsLab/resources/issues/356) continues.

[Transcriptome assembly with Trinity completed 20180919](https://robertslab.github.io/sams-notebook/posts/2018/2018-09-19-transcriptome-assembly-olympia-oyster-rnaseq-data-with-trinity/).

Then, [aligned the assembled transcriptome to our genome using Bowtie2](https://robertslab.github.io/sams-notebook/posts/2018/2018-09-20-transcriptome-alignment-olympia-oyster-trinity-transcriptome-aligned-to-genome-with-bowtie2/).

Finally, I used BEDTools to convert the BAM to BED to bedgraph.

This required an initial indexing of our Olympia oyster genome FastA using samtools faidx tool.

SBATCH script file:


  * [20180924_oly_RNAseq_bedgraphs.sh](https://owl.fish.washington.edu/Athaliana/20180924_oly_RNAseq_bedgraphs/20180924_oly_RNAseq_bedgraphs.sh)


    <code>
    #!/bin/bash
    ## Job Name
    #SBATCH --job-name=20180924_oly_bedgraphs
    ## Allocation Definition 
    #SBATCH --account=srlab
    #SBATCH --partition=srlab
    ## Resources
    ## Nodes
    #SBATCH --nodes=1
    ## Walltime (days-hours:minutes:seconds format)
    #SBATCH --time=5-00:00:00
    ## Memory per node
    #SBATCH --mem=500G
    ##turn on e-mail notification
    #SBATCH --mail-type=ALL
    #SBATCH --mail-user=samwhite@uw.edu
    ## Specify the working directory for this job
    #SBATCH --workdir=/gscratch/scrubbed/samwhite/20180924_oly_RNAseq_bedgraphs
    
    # Load Python Mox module for Python module availability
    
    module load intel-python3_2017
    
    # Document programs in PATH (primarily for program version ID)
    
    date >> system_path.log
    echo "" >> system_path.log
    echo "System PATH for $SLURM_JOB_ID" >> system_path.log
    echo "" >> system_path.log
    printf "%0.s-" {1..10} >> system_path.log
    echo ${PATH} | tr : \\n >> system_path.log
    
    
    # Set genome assembly FastA
    oly_genome_fasta=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/Olurida_v081.fa
    
    # Set indexed genome assembly file
    oly_genome_indexed=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/Olurida_v081.fa.fai
    
    # Set sorted transcriptome assembly bam file
    oly_transcriptome=/gscratch/scrubbed/samwhite/20180919_oly_transcriptome_bowtie2/20180919_Olurida_v081.sorted.bam
    
    
    # Set program paths
    bedtools=/gscratch/srlab/programs/bedtools-2.27.1/bin
    samtools=/gscratch/srlab/programs/samtools-1.9/samtools
    
    # Index genome FastA
    ${samtools} faidx ${oly_genome_fasta}
    
    # Format indexed genome for bedtools
    ## Requires only two columns: name<tab>length
    awk -v OFS='\t' {'print $1,$2'} ${oly_genome_indexed} > Olurida_v081.fa.fai.genome
    
    # Create bed file
    ${bedtools}/bamToBed \
    -i ${oly_transcriptome} \
    > 20180924_oly_RNAseq.bam.bed
    
    
    # Create bedgraph
    ## Reports depth at each position (-bg in bedgraph format) and report regions with zero coverage (-a).
    ## Screens for portions of reads coming from exons (-split).
    ## Add genome browser track line to header of bedgraph file.
    ${bedtools}/genomeCoverageBed \
    -i ${PWD}/20180924_oly_RNAseq.bed \
    -g Olurida_v081.fa.fai.genome \
    -bga \
    -split \
    -trackline \
    > 20180924_oly_RNAseq.bed
    </code>


Alignment was done using the following version of the Olympia oyster genome assembly:


  * [Olurida_v081.fa](https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.fa)


* * *


# RESULTS:


Output folder:


  * [20180924_oly_RNAseq_bedgraphs/](https://owl.fish.washington.edu/Athaliana/20180924_oly_RNAseq_bedgraphs/)


Indexed and formatted genome file:


  * [Olurida_v081.fa.fai.genome](https://owl.fish.washington.edu/Athaliana/20180924_oly_RNAseq_bedgraphs/Olurida_v081.fa.fai.genome)


Bedgraph file (for IGV):


  * [20180924_oly_RNAseq.bed](https://owl.fish.washington.edu/Athaliana/20180924_oly_RNAseq_bedgraphs/20180924_oly_RNAseq.bed)


* * *


This doesn't appear to have worked properly. Here's a view of the bedgraph file:


    <code>
    track type=bedGraph
    Contig0 0   116746  0
    Contig1 0   87411   0
    Contig2 0   139250  0
    Contig3 0   141657  0
    Contig4 0   95692   0
    Contig5 0   130522  0
    Contig6 0   94893   0
    Contig7 0   109667  0
    Contig8 0   95943   0
    </code>


I'd expect multiple entries for each contig (ideally), indicating start/stop positions for where transcripts align within a given contig. However, this appears to simply be a list of all the genome contigs and their lengths (Start=0, Stop=n).

I would expect to see something li

I'll look into this further and see where this pipeline goes wrong.