--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left layout: post title: Transcript Identification and Alignments - C.virginica RNAseq with NCBI Genome GCF_002022765.2 Using Hisat2 and Stringtie on Mox Again date: '2023-08-21 14:41' tags: - hisat2 - mox - Crassostrea virginica - stringtie - RNAseq - Eastern oyster categories: - 2023 - Miscellaneous --- In the process of generating expression matrices for [CEABIGR](https://github.com/sr320/ceabigr/tree/main/data) (GitHub repo), and in turn distance matrices for these samples, I realized a couple of things: 1. [DESeq2](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) has a function (`assay()`) that will automatically calculate/produce the distance matrices I want. 2. [`StringTie`](https://ccb.jhu.edu/software/stringtie/) comes with a script (`prepDE.py3`) which will spit out expression matrices formatted for import into [DESeq2](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html). So, this should be easy, right? Of course not. LOL. When trying to run the [`StringTie`](https://ccb.jhu.edu/software/stringtie/) `prepDE.py3` script on the [original output from Stringtie on 20220225](https://robertslab.github.io/sams-notebook/posts/2022/2022-02-25-Transcript-Identification-and-Alignments---C.virginica-RNAseq-with-NCBI-Genome-GCF_002022765.2-Using-Hisat2-and-Stringtie-on-Mox/) (notebook), I encountered a parsing error for the very first GTF file it encountered. Considering the GTF file(s) was/are generated by [`StringTie`](https://ccb.jhu.edu/software/stringtie/), it seemed odd that it wouldn't be able to parse the GTF. Doing some investigating, I came across a number of [issues in the Stringtie GitHub repo](https://github.com/gpertea/stringtie/issues) indicating that using a version of Stringtie (e.g. <`v2.2.0`), with a certain combination of arguments (specifically, the `-e` argument seemed potentially problematic) could create various errors if using that output with an updated version of [`StringTie`](https://ccb.jhu.edu/software/stringtie/). Well, I was doing just that! Was it causing the parsing error I was experiencing? Don't know, but I'd also like to avoid the other downstream errors potentially associated with the output(s) from the older version of Stringtie, so I've decided to re-run the analysis using a the most current version of [`StringTie`](https://ccb.jhu.edu/software/stringtie/). I used the [trimmed RNAseq reads from 20220224](https://robertslab.github.io/sams-notebook/posts/2022/2022-02-24-Trimming-Additional-20bp-from-C.virginica-Gonad-RNAseq-with-fastp-on-Mox/) (notebook). I also needed to identify alternative transcripts in the [_Crassostrea virginica_ (Eastern oyster)](https://en.wikipedia.org/wiki/Eastern_oyster) gonad RNAseq data we have. I previously used [`HISAT2`](https://daehwankimlab.github.io/hisat2/) to index the NCBI [_Crassostrea virginica_ (Eastern oyster)](https://en.wikipedia.org/wiki/Eastern_oyster) genome and identify exon/splice sites [on 20210720](https://robertslab.github.io/sams-notebook/2021/07/20/Genome-Annotations-Splice-Site-and-Exon-Extractions-for-C.virginica-GCF_002022765.2-Genome-Using-Hisat2-on-Mox/). Then, I used this genome index to run [`StringTie`](https://ccb.jhu.edu/software/stringtie/) on Mox in order to map sequencing reads to the genome/alternative isoforms. Job was run on Mox. Skip to [RESULTS section](#results). SLURM Script (GitHub): - [20230821-cvir-stringtie-GCF_002022765.2-isoforms.sh](https://github.com/RobertsLab/sams-notebook/blob/master/sbatch_scripts/20230821-cvir-stringtie-GCF_002022765.2-isoforms.sh) ```bash #!/bin/bash ## Job Name #SBATCH --job-name=20230821-cvir-stringtie-GCF_002022765.2-isoforms ## Allocation Definition #SBATCH --account=srlab #SBATCH --partition=srlab ## Resources ## Nodes #SBATCH --nodes=1 ## Walltime (days-hours:minutes:seconds format) #SBATCH --time=3-12:00:00 ## Memory per node #SBATCH --mem=500G ##turn on e-mail notification #SBATCH --mail-type=ALL #SBATCH --mail-user=samwhite@uw.edu ## Specify the working directory for this job #SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20230821-cvir-stringtie-GCF_002022765.2-isoforms ## Script using Stringtie with NCBI C.virginica genome assembly ## and HiSat2 index generated on 20210714. ## Expects FastQ input filenames to match _R[12].fastp-trim.20bp-5prime.20220224.fq.gz ## This is an updated run of 20230821-cvir-stringtie-GCF_002022765.2-isoforms. The previous run used an ## outdated version of StringTie, which may have led to some downstream issues. ################################################################################### # These variables need to be set by user ## Assign Variables # Set number of CPUs to use threads=28 # Index name for Hisat2 use # Needs to match index naem used in previous Hisat2 indexing step genome_index_name="cvir_GCF_002022765.2" # Location of Hisat2 index files # Must keep variable name formatting, as it's used by HiSat2 HISAT2_INDEXES=$(pwd) export HISAT2_INDEXES # Paths to programs hisat2_dir="/gscratch/srlab/programs/hisat2-2.1.0" hisat2="${hisat2_dir}/hisat2" samtools="/gscratch/srlab/programs/samtools-1.10/samtools" stringtie="/gscratch/srlab/programs/stringtie-2.2.1.Linux_x86_64/stringtie" prepDE="/gscratch/srlab/programs/stringtie-2.2.1.Linux_x86_64/prepDE.py3" # Input/output files genome_index_dir="/gscratch/srlab/sam/data/C_virginica/genomes" genome_gff="${genome_index_dir}/GCF_002022765.2_C_virginica-3.0_genomic.gff" fastq_dir="/gscratch/srlab/sam/data/C_virginica/RNAseq/" gtf_list="gtf_list.txt" merged_bam="20230821_cvir_stringtie_GCF_002022765-sorted-bams-merged.bam" # Declare associative array of sample names and metadata declare -A samples_associative_array=() # Set total number of samples (NOT number of FastQ files) total_samples=26 # Programs associative array declare -A programs_array programs_array=( [hisat2]="${hisat2}" \ [prepDE]="${prepDE}" \ [samtools_index]="${samtools} index" \ [samtools_merge]="${samtools} merge" \ [samtools_sort]="${samtools} sort" \ [samtools_view]="${samtools} view" \ [stringtie]="${stringtie}" ) ################################################################################################### # Exit script if any command fails set -e # Load Python Mox module for Python module availability module load intel-python3_2017 ## Load associative array ## Only need to use one set of reads to capture sample name # Set sample counter for array verification sample_counter=0 # Load array for fastq in "${fastq_dir}"*_R1.fastp-trim.20bp-5prime.20220224.fq.gz do # Increment counter ((sample_counter+=1)) # Remove path sample_name="${fastq##*/}" # Get sample name from first _-delimited field sample_name=$(echo "${sample_name}" | awk -F "_" '{print $1}') # Set treatment condition for each sample if [[ "${sample_name}" == "S12M" ]] \ || [[ "${sample_name}" == "S22F" ]] \ || [[ "${sample_name}" == "S23M" ]] \ || [[ "${sample_name}" == "S29F" ]] \ || [[ "${sample_name}" == "S31M" ]] \ || [[ "${sample_name}" == "S35F" ]] \ || [[ "${sample_name}" == "S36F" ]] \ || [[ "${sample_name}" == "S3F" ]] \ || [[ "${sample_name}" == "S41F" ]] \ || [[ "${sample_name}" == "S48F" ]] \ || [[ "${sample_name}" == "S50F" ]] \ || [[ "${sample_name}" == "S59M" ]] \ || [[ "${sample_name}" == "S77F" ]] \ || [[ "${sample_name}" == "S9M" ]] then treatment="exposed" else treatment="control" fi # Append to associative array samples_associative_array+=(["${sample_name}"]="${treatment}") done # Check array size to confirm it has all expected samples # Exit if mismatch if [[ "${#samples_associative_array[@]}" != "${sample_counter}" ]] \ || [[ "${#samples_associative_array[@]}" != "${total_samples}" ]] then echo "samples_associative_array doesn't have all 26 samples." echo "" echo "samples_associative_array contents:" echo "" for item in "${!samples_associative_array[@]}" do printf "%s\t%s\n" "${item}" "${samples_associative_array[${item}]}" done exit fi # Copy Hisat2 genome index files rsync -av "${genome_index_dir}"/${genome_index_name}*.ht2 . for sample in "${!samples_associative_array[@]}" do ## Inititalize arrays fastq_array_R1=() fastq_array_R2=() # Create array of fastq R1 files # and generated MD5 checksums file. for fastq in "${fastq_dir}""${sample}"*_R1.fastp-trim.20bp-5prime.20220224.fq.gz do fastq_array_R1+=("${fastq}") echo "Generating checksum for ${fastq}..." md5sum "${fastq}" >> input_fastqs_checksums.md5 echo "Checksum for ${fastq} completed." echo "" done # Create array of fastq R2 files for fastq in "${fastq_dir}""${sample}"*_R2.fastp-trim.20bp-5prime.20220224.fq.gz do fastq_array_R2+=("${fastq}") echo "Generating checksum for ${fastq}..." md5sum "${fastq}" >> input_fastqs_checksums.md5 echo "Checksum for ${fastq} completed." echo "" done # Create comma-separated lists of FastQs for Hisat2 printf -v joined_R1 '%s,' "${fastq_array_R1[@]}" fastq_list_R1=$(echo "${joined_R1%,}") printf -v joined_R2 '%s,' "${fastq_array_R2[@]}" fastq_list_R2=$(echo "${joined_R2%,}") # Create and switch to dedicated sample directory mkdir "${sample}" && cd "$_" # Hisat2 alignments # Sets read group info (RG) using samples array "${programs_array[hisat2]}" \ -x "${genome_index_name}" \ -1 "${fastq_list_R1}" \ -2 "${fastq_list_R2}" \ -S "${sample}".sam \ --rg-id "${sample}" \ --rg "SM:""${samples_associative_array[$sample]}" \ 2> "${sample}"_hisat2.err # Sort SAM files, convert to BAM, and index ${programs_array[samtools_view]} \ -@ "${threads}" \ -Su "${sample}".sam \ | ${programs_array[samtools_sort]} - \ -@ "${threads}" \ -o "${sample}".sorted.bam # Index BAM ${programs_array[samtools_index]} "${sample}".sorted.bam # Run stringtie on alignments # Uses "-B" option to output tables intended for use in Ballgown # Uses "-e" option; recommended when using "-B" option. # Limits analysis to only reads alignments matching reference. "${programs_array[stringtie]}" "${sample}".sorted.bam \ -p "${threads}" \ -o "${sample}".gtf \ -G "${genome_gff}" \ -C "${sample}.cov_refs.gtf" \ -B \ -e # Add GTFs to list file, only if non-empty # Identifies GTF files that only have header gtf_lines=$(wc -l < "${sample}".gtf ) if [ "${gtf_lines}" -gt 2 ]; then echo "$(pwd)/${sample}.gtf" >> ../"${gtf_list}" fi # Delete unneeded SAM files rm ./*.sam # Generate checksums for file in * do md5sum "${file}" >> ${sample}_checksums.md5 done # Move up to orig. working directory cd .. done # Merge all BAMs to singular BAM for use in transcriptome assembly later ## Create list of sorted BAMs for merging find . -name "*sorted.bam" > sorted_bams.list ## Merge sorted BAMs ${programs_array[samtools_merge]} \ -b sorted_bams.list \ ${merged_bam} \ --threads ${threads} ## Index merged BAM ${programs_array[samtools_index]} ${merged_bam} # Create singular transcript file, using GTF list file "${programs_array[stringtie]}" --merge \ "${gtf_list}" \ -p "${threads}" \ -G "${genome_gff}" \ -o "${genome_index_name}".stringtie.gtf # Create file list for prepDE.py while read -r line do echo ${line##*/} ${line} done < gtf_list.txt >> prepDE-sample_list.txt # Create count matrices for genes and transcripts # Compatible with import to DESeq2 python3 "${programs_array[prepDE]}" --input=prepDE-sample_list.txt # Delete unneccessary index files rm "${genome_index_name}"*.ht2 # Generate checksums # Uses find command to avoid passing # directory names to the md5sum command. find . -maxdepth 1 -type f -exec md5sum {} + \ | tee --append checksums.md5 ####################################################################################################### # Capture program options if [[ "${#programs_array[@]}" -gt 0 ]]; then echo "Logging program options..." for program in "${!programs_array[@]}" do { echo "Program options for ${program}: " echo "" # Handle samtools help menus if [[ "${program}" == "samtools_index" ]] \ || [[ "${program}" == "samtools_sort" ]] \ || [[ "${program}" == "samtools_view" ]] then ${programs_array[$program]} # Handle DIAMOND BLAST menu elif [[ "${program}" == "diamond" ]]; then ${programs_array[$program]} help # Handle NCBI BLASTx menu elif [[ "${program}" == "blastx" ]]; then ${programs_array[$program]} -help # Handle StringTie prepDE script elif [[ "${program}" == "prepDE" ]]; then python3 ${programs_array[$program]} -h fi ${programs_array[$program]} -h echo "" echo "" echo "----------------------------------------------" echo "" echo "" } &>> program_options.log || true # If MultiQC is in programs_array, copy the config file to this directory. if [[ "${program}" == "multiqc" ]]; then cp --preserve ~/.multiqc_config.yaml multiqc_config.yaml fi done echo "Finished logging programs options." echo "" fi # Document programs in PATH (primarily for program version ID) echo "Logging system $PATH..." { date echo "" echo "System PATH for $SLURM_JOB_ID" echo "" printf "%0.s-" {1..10} echo "${PATH}" | tr : \\n } >> system_path.log echo "Finished logging system $PATH." ``` --- # RESULTS Took about 2.5 days to run. Output folder: - [20230821-cvir-stringtie-GCF_002022765.2-isoforms/](https://gannet.fish.washington.edu/Atumefaciens/20230821-cvir-stringtie-GCF_002022765.2-isoforms/) - List of input FastQs and checksums (text): - [20230821-cvir-stringtie-GCF_002022765.2-isoforms/input_fastqs_checksums.md5](https://gannet.fish.washington.edu/Atumefaciens/20230821-cvir-stringtie-GCF_002022765.2-isoforms/input_fastqs_checksums.md5) - Full GTF file (GTF; 143MB): - [20230821-cvir-stringtie-GCF_002022765.2-isoforms/cvir_GCF_002022765.2.stringtie.gtf](https://gannet.fish.washington.edu/Atumefaciens/20230821-cvir-stringtie-GCF_002022765.2-isoforms/cvir_GCF_002022765.2.stringtie.gtf) - Merged BAM file (79GB): - [20230821-cvir-stringtie-GCF_002022765.2-isoforms/20230821_cvir_stringtie_GCF_002022765-sorted-bams-merged.bam](https://gannet.fish.washington.edu/Atumefaciens/20230821-cvir-stringtie-GCF_002022765.2-isoforms/20230821_cvir_stringtie_GCF_002022765-sorted-bams-merged.bam) - MD5 checksum: - `466815fe9fa3f559b500ea8aff2de5b1` - Merged BAM index file (useful for IGV): - [20230821-cvir-stringtie-GCF_002022765.2-isoforms/20230821_cvir_stringtie_GCF_002022765-sorted-bams-merged.bam.bai](https://gannet.fish.washington.edu/Atumefaciens/20230821-cvir-stringtie-GCF_002022765.2-isoforms/20230821_cvir_stringtie_GCF_002022765-sorted-bams-merged.bam.bai) - Gene counts matrix (CSV; 4.5MB) - [gene_count_matrix.csv](https://gannet.fish.washington.edu/Atumefaciens/20230821-cvir-stringtie-GCF_002022765.2-isoforms/gene_count_matrix.csv) - Transcript counts matrix (CSV; 6.3MB) - [transcript_count_matrix.csv](https://gannet.fish.washington.edu/Atumefaciens/20230821-cvir-stringtie-GCF_002022765.2-isoforms/transcript_count_matrix.csv) Since there are a large number of folders/files, the resulting directory structure for all of the [`StringTie`](https://ccb.jhu.edu/software/stringtie/) output is [shown at the end of this post](#directory-tree). Here's a description of all the file types found in each directory: - `*.ctab`: See [`ballgown` documentation](https://github.com/alyssafrazee/ballgown) for description of these. - `*.checksums.md5`: MD5 checksums for all files in each directory. - `*.cov_refs.gtf`: Coverage GTF generate by [`StringTie`](https://ccb.jhu.edu/software/stringtie/) and used to generate final GTF for each sample. - `*.gtf`: Final GTF file produced by [`StringTie`](https://ccb.jhu.edu/software/stringtie/) for each sample. - `*_hisat2.err`: Standard error output from [`HISAT2`](https://daehwankimlab.github.io/hisat2/). Contains alignment info. - `*.sorted.bam`: Sorted BAM alignments file produced by [`HISAT2`](https://daehwankimlab.github.io/hisat2/). - `*.sorted.bam.bai`: BAM index file. The [initial alignments from 20210726](https://robertslab.github.io/sams-notebook/posts/2021/2021-07-26-Transcript-Identification-and-Quantification---C.virginia-RNAseq-With-NCBI-Genome-GCF_002022765.2-Using-StringTie-on-Mox/) which accidentally used untrimmed sequencing reads had some truly abysmal alignment rates (males were ~30% and females were around 45%). This round is a _marked_ improvement. The females exhibit alignment rates around what one would expect (> 80%), while the males, even though relatively low (around 57%), it is drasticalliy better than the 30% seen when using the untrimmed reads. Still, the alignment rates are consistently low/lower in males, compared to the females. Not sure of what this means, but exploring some additional avenues to investigate (e.g. possible residual rRNA, possible contamination with other organismal RNA) Here's a table. The letter `M` or `F` in the sample name column indicates sex. | Sample | Alignment Rate | |--------|----------------| | S12M | 58.09% | | S13M | 58.44% | | S16F | 81.08% | | S19F | 82.05% | | S22F | 82.16% | | S23M | 57.06% | | S29F | 75.92% | | S31M | 61.12% | | S35F | 81.95% | | S36F | 80.60% | | S39F | 82.52% | | S3F | 82.31% | | S41F | 78.38% | | S44F | 78.70% | | S48M | 57.60% | | S50F | 82.96% | | S52F | 73.20% | | S53F | 81.48% | | S54F | 77.75% | | S59M | 65.81% | | S64M | 71.53% | | S6M | 57.82% | | S76F | 82.82% | | S77F | 84.37% | | S7M | 58.74% | | S9M | 57.95% | As hoped/expected, these alignment rates are identical to [the previous run using an old version of Stringtie from 20220225](https://robertslab.github.io/sams-notebook/posts/2022/2022-02-25-Transcript-Identification-and-Alignments---C.virginica-RNAseq-with-NCBI-Genome-GCF_002022765.2-Using-Hisat2-and-Stringtie-on-Mox/) (notebook). Now, need to send this data back through Ballgown... --- ### Directory tree ```bash ├── 20230821-cvir-stringtie-GCF_002022765.2-isoforms.sh ├── 20230821_cvir_stringtie_GCF_002022765-sorted-bams-merged.bam ├── 20230821_cvir_stringtie_GCF_002022765-sorted-bams-merged.bam.bai ├── checksums.md5 ├── cvir_GCF_002022765.2.stringtie.gtf ├── gene_count_matrix.csv ├── gtf_list.txt ├── input_fastqs_checksums.md5 ├── prepDE-sample_list.txt ├── program_options.log ├── S12M │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S12M_checksums.md5 │   ├── S12M.cov_refs.gtf │   ├── S12M.gtf │   ├── S12M_hisat2.err │   ├── S12M.sorted.bam │   ├── S12M.sorted.bam.bai │   └── t_data.ctab ├── S13M │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S13M_checksums.md5 │   ├── S13M.cov_refs.gtf │   ├── S13M.gtf │   ├── S13M_hisat2.err │   ├── S13M.sorted.bam │   ├── S13M.sorted.bam.bai │   └── t_data.ctab ├── S16F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S16F_checksums.md5 │   ├── S16F.cov_refs.gtf │   ├── S16F.gtf │   ├── S16F_hisat2.err │   ├── S16F.sorted.bam │   ├── S16F.sorted.bam.bai │   └── t_data.ctab ├── S19F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S19F_checksums.md5 │   ├── S19F.cov_refs.gtf │   ├── S19F.gtf │   ├── S19F_hisat2.err │   ├── S19F.sorted.bam │   ├── S19F.sorted.bam.bai │   └── t_data.ctab ├── S22F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S22F_checksums.md5 │   ├── S22F.cov_refs.gtf │   ├── S22F.gtf │   ├── S22F_hisat2.err │   ├── S22F.sorted.bam │   ├── S22F.sorted.bam.bai │   └── t_data.ctab ├── S23M │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S23M_checksums.md5 │   ├── S23M.cov_refs.gtf │   ├── S23M.gtf │   ├── S23M_hisat2.err │   ├── S23M.sorted.bam │   ├── S23M.sorted.bam.bai │   └── t_data.ctab ├── S29F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S29F_checksums.md5 │   ├── S29F.cov_refs.gtf │   ├── S29F.gtf │   ├── S29F_hisat2.err │   ├── S29F.sorted.bam │   ├── S29F.sorted.bam.bai │   └── t_data.ctab ├── S31M │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S31M_checksums.md5 │   ├── S31M.cov_refs.gtf │   ├── S31M.gtf │   ├── S31M_hisat2.err │   ├── S31M.sorted.bam │   ├── S31M.sorted.bam.bai │   └── t_data.ctab ├── S35F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S35F_checksums.md5 │   ├── S35F.cov_refs.gtf │   ├── S35F.gtf │   ├── S35F_hisat2.err │   ├── S35F.sorted.bam │   ├── S35F.sorted.bam.bai │   └── t_data.ctab ├── S36F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S36F_checksums.md5 │   ├── S36F.cov_refs.gtf │   ├── S36F.gtf │   ├── S36F_hisat2.err │   ├── S36F.sorted.bam │   ├── S36F.sorted.bam.bai │   └── t_data.ctab ├── S39F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S39F_checksums.md5 │   ├── S39F.cov_refs.gtf │   ├── S39F.gtf │   ├── S39F_hisat2.err │   ├── S39F.sorted.bam │   ├── S39F.sorted.bam.bai │   └── t_data.ctab ├── S3F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S3F_checksums.md5 │   ├── S3F.cov_refs.gtf │   ├── S3F.gtf │   ├── S3F_hisat2.err │   ├── S3F.sorted.bam │   ├── S3F.sorted.bam.bai │   └── t_data.ctab ├── S41F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S41F_checksums.md5 │   ├── S41F.cov_refs.gtf │   ├── S41F.gtf │   ├── S41F_hisat2.err │   ├── S41F.sorted.bam │   ├── S41F.sorted.bam.bai │   └── t_data.ctab ├── S44F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S44F_checksums.md5 │   ├── S44F.cov_refs.gtf │   ├── S44F.gtf │   ├── S44F_hisat2.err │   ├── S44F.sorted.bam │   ├── S44F.sorted.bam.bai │   └── t_data.ctab ├── S48M │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S48M_checksums.md5 │   ├── S48M.cov_refs.gtf │   ├── S48M.gtf │   ├── S48M_hisat2.err │   ├── S48M.sorted.bam │   ├── S48M.sorted.bam.bai │   └── t_data.ctab ├── S50F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S50F_checksums.md5 │   ├── S50F.cov_refs.gtf │   ├── S50F.gtf │   ├── S50F_hisat2.err │   ├── S50F.sorted.bam │   ├── S50F.sorted.bam.bai │   └── t_data.ctab ├── S52F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S52F_checksums.md5 │   ├── S52F.cov_refs.gtf │   ├── S52F.gtf │   ├── S52F_hisat2.err │   ├── S52F.sorted.bam │   ├── S52F.sorted.bam.bai │   └── t_data.ctab ├── S53F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S53F_checksums.md5 │   ├── S53F.cov_refs.gtf │   ├── S53F.gtf │   ├── S53F_hisat2.err │   ├── S53F.sorted.bam │   ├── S53F.sorted.bam.bai │   └── t_data.ctab ├── S54F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S54F_checksums.md5 │   ├── S54F.cov_refs.gtf │   ├── S54F.gtf │   ├── S54F_hisat2.err │   ├── S54F.sorted.bam │   ├── S54F.sorted.bam.bai │   └── t_data.ctab ├── S59M │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S59M_checksums.md5 │   ├── S59M.cov_refs.gtf │   ├── S59M.gtf │   ├── S59M_hisat2.err │   ├── S59M.sorted.bam │   ├── S59M.sorted.bam.bai │   └── t_data.ctab ├── S64M │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S64M_checksums.md5 │   ├── S64M.cov_refs.gtf │   ├── S64M.gtf │   ├── S64M_hisat2.err │   ├── S64M.sorted.bam │   ├── S64M.sorted.bam.bai │   └── t_data.ctab ├── S6M │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S6M_checksums.md5 │   ├── S6M.cov_refs.gtf │   ├── S6M.gtf │   ├── S6M_hisat2.err │   ├── S6M.sorted.bam │   ├── S6M.sorted.bam.bai │   └── t_data.ctab ├── S76F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S76F_checksums.md5 │   ├── S76F.cov_refs.gtf │   ├── S76F.gtf │   ├── S76F_hisat2.err │   ├── S76F.sorted.bam │   ├── S76F.sorted.bam.bai │   └── t_data.ctab ├── S77F │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S77F_checksums.md5 │   ├── S77F.cov_refs.gtf │   ├── S77F.gtf │   ├── S77F_hisat2.err │   ├── S77F.sorted.bam │   ├── S77F.sorted.bam.bai │   └── t_data.ctab ├── S7M │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S7M_checksums.md5 │   ├── S7M.cov_refs.gtf │   ├── S7M.gtf │   ├── S7M_hisat2.err │   ├── S7M.sorted.bam │   ├── S7M.sorted.bam.bai │   └── t_data.ctab ├── S9M │   ├── e2t.ctab │   ├── e_data.ctab │   ├── i2t.ctab │   ├── i_data.ctab │   ├── S9M_checksums.md5 │   ├── S9M.cov_refs.gtf │   ├── S9M.gtf │   ├── S9M_hisat2.err │   ├── S9M.sorted.bam │   ├── S9M.sorted.bam.bai │   └── t_data.ctab ├── slurm-4778731.out ├── slurm-4800642.out ├── slurm-4800644.out ├── sorted_bams.list ├── system_path.log └── transcript_count_matrix.csv 27 directories, 302 files ```