--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left date: 2018-09-24 23:23:14+00:00 layout: post slug: bedgraph-olympia-oyster-transcriptome-fail title: Bedgraph - Olympia oyster transcriptome (FAIL) categories: - 2018 - Miscellaneous tags: - alignment - bedgraph - bedtools - mox - olympia oyster - Ostrea lurida - samtools - transcriptome --- Progress on generating [bedgraphs from our Olympia oyster transcriptome](https://github.com/RobertsLab/resources/issues/356) continues. [Transcriptome assembly with Trinity completed 20180919](https://robertslab.github.io/sams-notebook/posts/2018/2018-09-19-transcriptome-assembly-olympia-oyster-rnaseq-data-with-trinity/). Then, [aligned the assembled transcriptome to our genome using Bowtie2](https://robertslab.github.io/sams-notebook/posts/2018/2018-09-20-transcriptome-alignment-olympia-oyster-trinity-transcriptome-aligned-to-genome-with-bowtie2/). Finally, I used BEDTools to convert the BAM to BED to bedgraph. This required an initial indexing of our Olympia oyster genome FastA using samtools faidx tool. SBATCH script file: * [20180924_oly_RNAseq_bedgraphs.sh](https://owl.fish.washington.edu/Athaliana/20180924_oly_RNAseq_bedgraphs/20180924_oly_RNAseq_bedgraphs.sh) #!/bin/bash ## Job Name #SBATCH --job-name=20180924_oly_bedgraphs ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=5-00:00:00 ## Memory per node #SBATCH --mem=500G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/20180924_oly_RNAseq_bedgraphs # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo ${PATH} | tr : \\n >> system_path.log # Set genome assembly FastA oly_genome_fasta=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/Olurida_v081.fa # Set indexed genome assembly file oly_genome_indexed=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/Olurida_v081.fa.fai # Set sorted transcriptome assembly bam file oly_transcriptome=/gscratch/scrubbed/samwhite/20180919_oly_transcriptome_bowtie2/20180919_Olurida_v081.sorted.bam # Set program paths bedtools=/gscratch/srlab/programs/bedtools-2.27.1/bin samtools=/gscratch/srlab/programs/samtools-1.9/samtools # Index genome FastA ${samtools} faidx ${oly_genome_fasta} # Format indexed genome for bedtools ## Requires only two columns: namelength awk -v OFS='\t' {'print $1,$2'} ${oly_genome_indexed} > Olurida_v081.fa.fai.genome # Create bed file ${bedtools}/bamToBed \ -i ${oly_transcriptome} \ > 20180924_oly_RNAseq.bam.bed # Create bedgraph ## Reports depth at each position (-bg in bedgraph format) and report regions with zero coverage (-a). ## Screens for portions of reads coming from exons (-split). ## Add genome browser track line to header of bedgraph file. ${bedtools}/genomeCoverageBed \ -i ${PWD}/20180924_oly_RNAseq.bed \ -g Olurida_v081.fa.fai.genome \ -bga \ -split \ -trackline \ > 20180924_oly_RNAseq.bed Alignment was done using the following version of the Olympia oyster genome assembly: * [Olurida_v081.fa](https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.fa) * * * # RESULTS: Output folder: * [20180924_oly_RNAseq_bedgraphs/](https://owl.fish.washington.edu/Athaliana/20180924_oly_RNAseq_bedgraphs/) Indexed and formatted genome file: * [Olurida_v081.fa.fai.genome](https://owl.fish.washington.edu/Athaliana/20180924_oly_RNAseq_bedgraphs/Olurida_v081.fa.fai.genome) Bedgraph file (for IGV): * [20180924_oly_RNAseq.bed](https://owl.fish.washington.edu/Athaliana/20180924_oly_RNAseq_bedgraphs/20180924_oly_RNAseq.bed) * * * This doesn't appear to have worked properly. Here's a view of the bedgraph file: track type=bedGraph Contig0 0 116746 0 Contig1 0 87411 0 Contig2 0 139250 0 Contig3 0 141657 0 Contig4 0 95692 0 Contig5 0 130522 0 Contig6 0 94893 0 Contig7 0 109667 0 Contig8 0 95943 0 I'd expect multiple entries for each contig (ideally), indicating start/stop positions for where transcripts align within a given contig. However, this appears to simply be a list of all the genome contigs and their lengths (Start=0, Stop=n). I would expect to see something li I'll look into this further and see where this pipeline goes wrong.