--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left layout: post title: Transcriptome Assembly - Geoduck Tissue-specific Assembly Juvenile Super Low OA EPI116 with HiSeq and NovaSeq Data on Mox date: '2019-04-09 06:52' tags: - Panopea generosa - geoduck - epi116 - OA - juvenile - mox - trinity - assembly - transcriptome categories: - 2019 - Miscellaneous --- Ran a _de novo_ assembly on our HiSeq and NovaSeq data from [Hollie's juvenile EPI 116 sample "super low OA"](https://github.com/hputnam/project_juvenile_geoduck_OA/blob/master/Setup_Notes/Sample_List.csv). This was done for Christian to use in some long, non-coding RNA (lncRNA) analysis. NovaSeq data had been [previously trimmed](https://robertslab.github.io/sams-notebook/posts/2018/2018-01-25-adapter-trimming-and-fastqc-illumina-geoduck-novaseq-data/). Trimming of the HiSeq data was performed via Trinity, using the `--trimmomatic` option. SBATCH script (GitHub): - [20190409_trinity_pgen_EPI116_RNAseq.sh](https://github.com/RobertsLab/sams-notebook/blob/master/sbatch_scripts/20190409_trinity_pgen_EPI116_RNAseq.sh) ```shell #!/bin/bash ## Job Name #SBATCH --job-name=trin_epi116 ## Allocation Definition #SBATCH --account=coenv #SBATCH --partition=coenv ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=30-00:00:00 ## Memory per node #SBATCH --mem=120G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190409_trinity_pgen_EPI116_RNAseq # Exit script if a command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 # Document programs in PATH (primarily for program version ID) date >> system_path.log echo "" >> system_path.log echo "System PATH for $SLURM_JOB_ID" >> system_path.log echo "" >> system_path.log printf "%0.s-" {1..10} >> system_path.log echo ${PATH} | tr : \\n >> system_path.log # User-defined variables reads_dir=/gscratch/scrubbed/samwhite/data/P_generosa/RNAseq/epi_116 threads=28 assembly_stats=assembly_stats.txt # Paths to programs trinity_dir="/gscratch/srlab/programs/Trinity-v2.8.3" samtools="/gscratch/srlab/programs/samtools-1.9/samtools" ## Inititalize arrays R1_array=() R2_array=() # Variables for R1/R2 lists R1_list="" R2_list="" # Create array of fastq R1 files R1_array=(${reads_dir}/*_R1_*.gz) # Create array of fastq R2 files R2_array=(${reads_dir}/*_R2_*.gz) # Create list of fastq files used in analysis ## Uses parameter substitution to strip leading path from filename for fastq in ${reads_dir}/*.gz do echo ${fastq##*/} >> fastq.list.txt done # Create comma-separated lists of FastQ reads R1_list=$(echo ${R1_array[@]} | tr " " ",") R2_list=$(echo ${R2_array[@]} | tr " " ",") # Run Trinity ${trinity_dir}/Trinity \ --trimmomatic \ --seqType fq \ --max_memory 120G \ --CPU ${threads} \ --left \ ${R1_list} \ --right \ ${R2_list} # Assembly stats ${trinity_dir}/util/TrinityStats.pl trinity_out_dir/Trinity.fasta \ > ${assembly_stats} # Create gene map files ${trinity_dir}/util/support_scripts/get_Trinity_gene_to_trans_map.pl \ trinity_out_dir/Trinity.fasta \ > trinity_out_dir/Trinity.fasta.gene_trans_map # Create FastA index ${samtools} faidx \ trinity_out_dir/Trinity.fasta ``` --- # RESULTS Output folder: - [20190409_trinity_pgen_EPI116_RNAseq/](http://gannet.fish.washington.edu/Atumefaciens/20190409_trinity_pgen_EPI116_RNAseq/) Trinity FastA: - [20190409_trinity_pgen_EPI116_RNAseq/trinity_out_dir/Trinity.fasta](http://gannet.fish.washington.edu/Atumefaciens/20190409_trinity_pgen_EPI116_RNAseq/trinity_out_dir/Trinity.fasta) Trinity FastA index file: - [20190409_trinity_pgen_EPI116_RNAseq/trinity_out_dir/Trinity.fasta.fai](20190409_trinity_pgen_EPI116_RNAseq/trinity_out_dir/Trinity.fasta.fai) Trinity Gene Trans Map file: - [20190409_trinity_pgen_EPI116_RNAseq/trinity_out_dir/Trinity.fasta.gene_trans_map](http://gannet.fish.washington.edu/Atumefaciens/20190409_trinity_pgen_EPI116_RNAseq/trinity_out_dir/Trinity.fasta.gene_trans_map) Assembly stats (text): - [20190409_trinity_pgen_EPI116_RNAseq/assembly_stats.txt](http://gannet.fish.washington.edu/Atumefaciens/20190409_trinity_pgen_EPI116_RNAseq/assembly_stats.txt) ``` ################################ ## Counts of transcripts, etc. ################################ Total trinity 'genes': 268476 Total trinity transcripts: 434877 Percent GC: 36.27 ######################################## Stats based on ALL transcript contigs: ######################################## Contig N10: 6036 Contig N20: 4167 Contig N30: 3076 Contig N40: 2287 Contig N50: 1663 Median contig length: 433 Average contig: 882.19 Total assembled bases: 383643405 ##################################################### ## Stats based on ONLY LONGEST ISOFORM per 'GENE': ##################################################### Contig N10: 4752 Contig N20: 3103 Contig N30: 2179 Contig N40: 1532 Contig N50: 1050 Median contig length: 366 Average contig: 679.12 Total assembled bases: 182327573 ``` List of input FastQs (text): - [20190409_trinity_pgen_EPI116_RNAseq/fastq.list.txt](http://gannet.fish.washington.edu/Atumefaciens/20190409_trinity_pgen_EPI116_RNAseq/fastq.list.txt) ``` Geoduck-juvenile-OA-exposure-RNA-EPI-116-1_S5_L001_R1_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-1_S5_L001_R2_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-2_S13_L002_R1_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-2_S13_L002_R2_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-3_S21_L003_R1_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-3_S21_L003_R2_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-4_S29_L004_R1_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-4_S29_L004_R2_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-5_S37_L005_R1_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-5_S37_L005_R2_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-6_S45_L006_R1_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-6_S45_L006_R2_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-7_S53_L007_R1_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-7_S53_L007_R2_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-8_S61_L008_R1_001.fastq.gz Geoduck-juvenile-OA-exposure-RNA-EPI-116-8_S61_L008_R2_001.fastq.gz NR005_S4_L001_R1_001_val_1_val_1.fq.gz NR005_S4_L001_R2_001_val_2_val_2.fq.gz NR005_S4_L002_R1_001_val_1_val_1.fq.gz NR005_S4_L002_R2_001_val_2_val_2.fq.gz ```