---
author: Sam White
toc-title: Contents
toc-depth: 5
toc-location: left
date: 2018-09-04 19:31:10+00:00
layout: post
slug: transcriptome-assembly-geoduck-rnaseq-data
title: Transcriptome Assembly - Geoduck RNAseq data
categories:
  - 2018
  - Miscellaneous
tags:
  - geoduck
  - mox
  - Panopea generosa
  - RNAseq
  - Trimmomatic
  - trinity
---

Used all of our current geoduck RNAseq data to assemble a transcriptome using Trinity.

Trinity was run our our Mox HPC node. Specifically, I had to use just a single node with 500GB of RAM. Trinity could not run with much less than that. Initially, I attempted to run with two nodes, but our smaller node (120GB) ended up limiting the available RAM (the system only uses the RAM available on the smallest node; it cannot combine RAM or dynamically allocate computing to a node with larger RAM when needed) and Trinity consistently crashed due to memory limitations.

Reads were trimmed using the built-in version of Trimmomatic with the default settings.

SBATCH script:


  * [20180827_geo_trinity.sh](https://owl.fish.washington.edu/Athaliana/20180827_trinity_geoduck_RNAseq/20180827_geo_trinity.sh)


Due to the huge number of input files, I won't post the entire script contents here. Instead, here's a snippet of the script showing the commands used to start the Trinity run:


    #!/bin/bash
    ## Job Name
    #SBATCH --job-name=20180829_trinity
    ## Allocation Definition
    #SBATCH --account=srlab
    #SBATCH --partition=srlab
    ## Resources
    ## Nodes
    #SBATCH --nodes=1
    ## Walltime (days-hours:minutes:seconds format)
    #SBATCH --time=30-00:00:00
    ## Memory per node
    #SBATCH --mem=500G
    ##turn on e-mail notification
    #SBATCH --mail-type=ALL
    #SBATCH --mail-user=samwhite@uw.edu
    ## Specify the working directory for this job
    #SBATCH --workdir=/gscratch/scrubbed/samwhite/20180827_trinity_geoduck_RNAseq

    # Load Python Mox module for Python module availability

    module load intel-python3_2017

    # Document programs in PATH (primarily for program version ID)

    date >> system_path.log
    echo "" >> system_path.log
    echo "System PATH for $SLURM_JOB_ID" >> system_path.log
    echo "" >> system_path.log
    printf "%0.s-" {1..10} >> system_path.log
    echo ${PATH} | tr : \\n >> system_path.log


    # Run Trinity
    /gscratch/srlab/programs/trinityrnaseq-Trinity-v2.8.3/Trinity \
    --trimmomatic \
    --seqType fq \
    --max_memory 500G \
    --CPU 28 \


Despite the naming conventions, this job was submitted to the Mox scheduler on 20180829 and finished on 20180901.

After job completion, the entire folder was gzipped (the following method of gzipping is SUPER fast, btw):


    tar -c 20180827_trinity_geoduck_RNAseq | pigz > 20180827_trinity_geoduck_RNAseq.tar.gz


* * *


# RESULTS:


Output folder:


  * [20180827_trinity_geoduck_RNAseq/](https://owl.fish.washington.edu/Athaliana/20180827_trinity_geoduck_RNAseq/)


Trinity assembly (FastA):


  * [20180827_trinity_geoduck_RNAseq/Trinity.fasta](https://owl.fish.washington.edu/Athaliana/20180827_trinity_geoduck_RNAseq/Trinity.fasta)


Next up, I'll get some annotations going by running through TransDecoder and blastx.