---
author: Sam White
toc-title: Contents
toc-depth: 5
toc-location: left
layout: post
title: Transcript Identification and Alignments - P.verrucosa RNA-seq with Pver_genome_assembly_v1.0 Using HiSat2 and Stringtie on Mox
date: '2023-02-16 08:20'
tags: 
  - hisat2
  - stringtie
  - mox
  - RNAseq
  - ballgown
  - Pocllipora verrucosa
  - cora
  - E5
categories: 
  - 2023
  - E5
---
After getting the [RNA-seq data trimmed](https://robertslab.github.io/sams-notebook/posts/2023/2023-02-15-FastQ-Trimming-and-QC---P.verrucosa-RNA-seq-Data-from-Danielle-Becker-in-Hollie-Putnam-Lab-Using-fastp-FastQC-and-MultiQC-on-Mox/), it was time to perform alignments and determine expression levels of transcripts/isoforms using with [`HISAT2`](https://daehwankimlab.github.io/hisat2/) and [`StringTie`](https://ccb.jhu.edu/software/stringtie/), respectively. [`StringTie`](https://ccb.jhu.edu/software/stringtie/) was set to output tables formatted for import into [`ballgown`](https://github.com/alyssafrazee/ballgown). After those two analyses were complete, I ran [`gffcompare`](https://ccb.jhu.edu/software/stringtie/gffcompare.shtml), using the merged [`StringTie`](https://ccb.jhu.edu/software/stringtie/) GTF and the input GFF3. I caught this in one of Danielle Becker's scripts and thought it might be interesting. The analsyes were run on Mox.

SBATCH Script (GitHub):

- [20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms.sh](https://github.com/RobertsLab/sams-notebook/blob/master/sbatch_scripts/20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms.sh)

```shell
#!/bin/bash
## Job Name
#SBATCH --job-name=20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=07-00:00:00
## Memory per node
#SBATCH --mem=120G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms


## Script using Stringtie with P.verrucosa v1.0 genome assembly
## and HiSat2 index generated on 20230131.

## Genome and GFF from here: http://pver.reefgenomics.org/download

## GTF generated on 20230127 by SJW:
## https://robertslab.github.io/sams-notebook/posts/2023/2023-01-27-Data-Wrangling---P.verrucosa-Genome-GFF-to-GTF-Using-gffread/

## HiSat2 index generated on 20230131 by SJW:
## https://robertslab.github.io/sams-notebook/posts/2023/2023-01-31-Genome-Indexing---P.verrucosa-v1.0-Assembly-with-HiSat2-on-Mox/

## Using trimmed FastQs from 20230215.

## Expects FastQ input filenames to match <sample name>_R[12].fastp-trim.20230215.fq.gz


###################################################################################
# These variables need to be set by user

## Assign Variables

# Set total number of SAMPLES (NOT number of FastQ files)
total_samples=32

# Set number of CPUs to use
threads=28

# Index name for Hisat2 use
# Needs to match index naem used in previous Hisat2 indexing step
genome_index_name="Pver_genome_assembly_v1.0"

# HiSat2 indexes tarball
index_tarball="Pver_genome_assembly_v1.0-hisat2-indices.tar.gz"

# Set input FastQ patterns
R1_fastq_pattern='*R1*fq.gz'
R2_fastq_pattern='*R2*fq.gz'
fastq_pattern='*.fastp-trim.20230215.fq.gz'

# Location of Hisat2 index files
# Must keep variable name formatting, as it's used by HiSat2
HISAT2_INDEXES=$(pwd)
export HISAT2_INDEXES

# Paths to programs
gffcompare="/gscratch/srlab/programs/gffcompare-0.12.6.Linux_x86_64/gffcompare"
hisat2_dir="/gscratch/srlab/programs/hisat2-2.1.0"
hisat2="${hisat2_dir}/hisat2"
samtools="/gscratch/srlab/programs/samtools-1.10/samtools"
stringtie="/gscratch/srlab/programs/stringtie-2.2.1.Linux_x86_64/stringtie"

# Input/output files
genome_index_dir="/gscratch/srlab/sam/data/P_verrucosa/genomes"
genome_gff="${genome_index_dir}/Pver_genome_assembly_v1.0-valid.gff3"
fastq_dir="/gscratch/srlab/sam/data/P_verrucosa/RNAseq/"
gtf_list="gtf_list.txt"
merged_bam="20230216-pver-stringtie-pver_v1.0-sorted-bams-merged.bam"

# Declare associative array of sample names and metadata
declare -A samples_associative_array=()

# Programs associative array
declare -A programs_array
programs_array=(
[gffcompare]="${gffcompare}" \
[hisat2]="${hisat2}" \
[samtools_index]="${samtools} index" \
[samtools_merge]="${samtools} merge" \
[samtools_sort]="${samtools} sort" \
[samtools_view]="${samtools} view" \
[stringtie]="${stringtie}"
)


###################################################################################################

# Exit script if any command fails
set -e

# Load Python Mox module for Python module availability

module load intel-python3_2017

## Load associative array
## Only need to use one set of reads to capture sample name

# Set sample counter for array verification
sample_counter=0

# Load array
# DO NOT QUOTE ${R1_fastq_pattern} - WILL NOT POPULATE ARRAY!
for fastq in "${fastq_dir}"${R1_fastq_pattern}
do
  # Increment counter
  ((sample_counter+=1))

  # Remove path
  sample_name="${fastq##*/}"

  # Get sample name from first _-delimited field
  sample_name=$(echo "${sample_name}" | awk -F "_" '{print $1}')
  
  # Set treatment condition for each sample
  # Used for setting read group (@RG) in SAM files
  if [[ "${sample_name}" == "C17" ]] || \
[[ "${sample_name}" == "C18" ]] || \
[[ "${sample_name}" == "C19" ]] || \
[[ "${sample_name}" == "C20" ]] || \
[[ "${sample_name}" == "C21" ]] || \
[[ "${sample_name}" == "C22" ]] || \
[[ "${sample_name}" == "C23" ]] || \
[[ "${sample_name}" == "C24" ]] || \
[[ "${sample_name}" == "C25" ]] || \
[[ "${sample_name}" == "C26" ]] || \
[[ "${sample_name}" == "C27" ]] || \
[[ "${sample_name}" == "C28" ]] || \
[[ "${sample_name}" == "C29" ]] || \
[[ "${sample_name}" == "C30" ]] || \
[[ "${sample_name}" == "C31" ]] || \
[[ "${sample_name}" == "C32" ]];
  then
    treatment="control"
  else
    treatment="enriched"
  fi

  # Append to associative array
  samples_associative_array+=(["${sample_name}"]="${treatment}")

done

# Check array size to confirm it has all expected samples
# Exit if mismatch
if [[ "${#samples_associative_array[@]}" != "${sample_counter}" ]] \
|| [[ "${#samples_associative_array[@]}" != "${total_samples}" ]]
  then
    echo "samples_associative_array doesn't have all ${total_samples} samples."
    echo ""
    echo "samples_associative_array contents:"
    echo ""
    for item in "${!samples_associative_array[@]}"
    do
      printf "%s\t%s\n" "${item}" "${samples_associative_array[${item}]}"
    done

    exit
fi

# Copy Hisat2 genome index files
echo ""
echo "Transferring HiSat2 index file now."
echo ""
rsync -av "${genome_index_dir}/${index_tarball}" .
echo ""

# Unpack Hisat2 index files
echo ""
echo "Unpacking Hisat2 index tarball: ${index_tarball}..."
echo ""
tar -xzvf ${index_tarball}
echo "Finished unpacking ${index_tarball}"
echo ""

#### BEGIN HISAT2 ALIGNMENTS ####
echo "Beginning HiSat2 alignments and StringTie analysis..."
echo ""
for sample in "${!samples_associative_array[@]}"
do

  ## Inititalize arrays
  fastq_array_R1=()
  fastq_array_R2=()

  # Create array of fastq R1 files
  # and generated MD5 checksums file.

  # DO NOT QUOTE ${fastq_pattern} 
  for fastq in "${fastq_dir}"${R1_fastq_pattern}
  do

    # Remove path
    sample_name="${fastq##*/}"

    # Get sample name from first _-delimited field
    sample_name=$(echo "${sample_name}" | awk -F "_" '{print $1}')

    # Check sample names for match
    if [[ "${sample_name}" == "${sample}" ]]
    then
      echo "Now working on ${sample} Read 1 FastQs."

      fastq_array_R1+=("${fastq}")

      echo "Generating checksum for ${fastq}..."

      md5sum "${fastq}" >> input_fastqs_checksums.md5

      echo "Checksum for ${fastq} completed."
      echo ""
    fi

  done

  # Create array of fastq R2 files
  # DO NOT QUOTE ${fastq_pattern} 
  for fastq in "${fastq_dir}"${R2_fastq_pattern}
  do
    # Remove path
    sample_name="${fastq##*/}"

    # Get sample name from first _-delimited field
    sample_name=$(echo "${sample_name}" | awk -F "_" '{print $1}')

    # Check sample names for match
    if [[ "${sample_name}" == "${sample}" ]]
    then
      echo "Now working on ${sample} Read 2 FastQs."

      fastq_array_R2+=("${fastq}")

      echo "Generating checksum for ${fastq}..."

      md5sum "${fastq}" >> input_fastqs_checksums.md5
      
      echo "Checksum for ${fastq} completed."
      echo ""
    fi
  done

  echo "Checksums for ${sample} Read 1 and 2 completed."

  # Create comma-separated lists of FastQs for Hisat2
  printf -v joined_R1 '%s,' "${fastq_array_R1[@]}"
  fastq_list_R1=$(echo "${joined_R1%,}")

  printf -v joined_R2 '%s,' "${fastq_array_R2[@]}"
  fastq_list_R2=$(echo "${joined_R2%,}")

  # Create and switch to dedicated sample directory
  echo ""
  echo "Creating ${sample} directory."
  mkdir "${sample}" && cd "$_"
  echo "Now in ${sample} directory."

  # HiSat2 alignments
  # Sets read group info (RG) using samples array
  echo ""
  echo "Running HiSat2 for sample ${sample}."
  "${programs_array[hisat2]}" \
  -x "${genome_index_name}" \
  -1 "${fastq_list_R1}" \
  -2 "${fastq_list_R2}" \
  -S "${sample}".sam \
  --rg-id "${sample}" \
  --rg "SM:""${samples_associative_array[$sample]}" \
  --threads "${threads}" \
  2> "${sample}-hisat2_stats.txt"
  echo ""
  echo "Hisat2 for  ${fastq_list_R1} and ${fastq_list_R2} complete."
  echo ""

  # Sort SAM files, convert to BAM, and index
  echo ""
  echo "Sorting ${sample}.sam and creating sorted BAM."
  echo ""
  ${programs_array[samtools_view]} \
  -@ "${threads}" \
  -Su "${sample}".sam \
  | ${programs_array[samtools_sort]} - \
  -@ "${threads}" \
  -o "${sample}".sorted.bam
  echo "Created ${sample}.sorted.bam"
  echo ""


  # Index BAM
  echo ""
  echo "Indexing ${sample}.sorted.bam..."
  ${programs_array[samtools_index]} "${sample}".sorted.bam
  echo ""
  echo "Indexing complete for ${sample}.sorted.bam."
  echo ""

  echo ""
  echo "HiSat2 completed for sample ${sample}."
  echo ""


#### END HISAT2 ALIGNMENTS ####

#### BEGIN STRINGTIE ####

  # Run stringtie on alignments
  # Uses "-B" option to output tables intended for use in Ballgown
  # Uses "-e" option; recommended when using "-B" option.
  # Limits analysis to only reads alignments matching reference.
  echo "Beginning StringTie analysis on ${sample}.sorted.bam."
  "${programs_array[stringtie]}" "${sample}".sorted.bam \
  -p "${threads}" \
  -o "${sample}".gtf \
  -G "${genome_gff}" \
  -C "${sample}.cov_refs.gtf" \
  -B
  
  echo "StringTie analysis finished for ${sample}.sorted.bam."
  echo ""
#### END STRINGTIE ####

# Add GTFs to list file, only if non-empty
# Identifies GTF files that only have header
  echo ""
  echo "Adding ${sample}.gtf to ../${gtf_list}."
  gtf_lines=$(wc -l < "${sample}".gtf )
  if [ "${gtf_lines}" -gt 2 ]; then
    echo "$(pwd)/${sample}.gtf" >> ../"${gtf_list}"
  fi
  echo ""

  # Delete unneeded SAM files
  echo "Removing any SAM files."
  echo ""
  rm ./*.sam

  # Generate checksums
  for file in *
  do
    echo ""
    echo "Generating MD5 checksum for ${file}."
    echo ""
    md5sum "${file}" | tee --append "${sample}_checksums.md5"
    echo ""
    echo "${file} checksum added to ${sample}_checksums.md5."
    echo ""
  done

  # Move up to orig. working directory
  echo "Moving to original working directory."
  echo ""
  cd ..
  echo "Now in $(pwd)."
  echo ""

  echo "Finished HiSat2 alignments and StringTie analysis for ${sample} FastQs."
  echo ""

done

echo "Finished all HiSat2 alignments and StringTie analysis."
echo ""


#### BEGIN MERGING BAMs ####

# Merge all BAMs to singular BAM for use in transcriptome assembly later
## Create list of sorted BAMs for merging
echo ""
echo "Creating list file of sorted BAMs..."
find . -name "*sorted.bam" > sorted_bams.list
echo "List of BAMs created: sorted_bams.list"
echo ""

## Merge sorted BAMs
echo "Merging all BAM files..."
echo ""
${programs_array[samtools_merge]} \
-b sorted_bams.list \
${merged_bam} \
--threads ${threads}
echo ""
echo "Finished creating ${merged_bam}."

#### END MERGING BAMs ####

#### BEGIN INDEXING MERGED BAM ####

## Index merged BAM
echo ""
echo "Indexing ${merged_bam}..."
echo ""
${programs_array[samtools_index]} ${merged_bam}
echo "Finished indexing ${merged_bam}."
echo ""

#### END INDEXING MERGED BAM ####

#### BEGIN MERGE STRINGTIE GTFs ####

# Create singular transcript file, using GTF list file
echo "Merging GTFs..."
echo ""
"${programs_array[stringtie]}" --merge \
"${gtf_list}" \
-p "${threads}" \
-G "${genome_gff}" \
-o "${genome_index_name}.stringtie.gtf"
echo ""
echo "Finished merging GTFs into ${genome_index_name}.stringtie.gtf"
echo ""
#### END MERGE STRINGTIE GTFs ####

# Delete unneccessary index files
echo ""
echo "Removing HiSat2 *.ht2 genome index files..."
echo ""
rm "${genome_index_name}"*.ht2
echo "All genome index files removed."
echo ""

#### BEGIN GFFCOMPARE ####
echo ""
echo "Beginning gffcompare..."
echo ""

# Make ggfcompare output directory and
# change into that directory
mkdir --parents gffcompare && cd "$_"

"${programs_array[gffcompare]}" \
-r "${genome_gff}" \
-o "${genome_index_name}-gffcmp" \
../"${genome_index_name}.stringtie.gtf"
echo ""
echo "Finished gffcompare"
echo ""

# Generate checksums
for file in *
do
  echo ""
  echo "Generating checksum for ${file}..."
  echo ""

  md5sum "${file}" | tee --append checksums.md5

  echo "Checksum generated."
done

# Move to previous directory
echo "Moving to previous directory..."
echo ""

cd -

echo "Now in $(pwd)."
echo ""

#### END GFFCOMPARE ####

# Generate checksums
echo "Generating checksums for files in $(pwd)."

for file in *
do
  echo ""
  echo "Generating checksum for ${file}..."
  echo ""

  md5sum "${file}" | tee --append checksums.md5

  echo "Checksum generated."
done

# Remove genome index tarball
echo ""
echo "Removing ${index_tarball}."

rm "${index_tarball}"

echo "${index_tarball} has been deleted."
echo ""

#######################################################################################################

# Capture program options
if [[ "${#programs_array[@]}" -gt 0 ]]; then
  echo "Logging program options..."
  for program in "${!programs_array[@]}"
  do
    {
    echo "Program options for ${program}: "
    echo ""
    # Handle samtools help menus
    if [[ "${program}" == "samtools_index" ]] \
    || [[ "${program}" == "samtools_sort" ]] \
    || [[ "${program}" == "samtools_view" ]]
    then
      ${programs_array[$program]}

    # Handle DIAMOND BLAST menu
    elif [[ "${program}" == "diamond" ]]; then
      ${programs_array[$program]} help

    # Handle NCBI BLASTx menu
    elif [[ "${program}" == "blastx" ]]; then
      ${programs_array[$program]} -help
    fi
    ${programs_array[$program]} -h
    echo ""
    echo ""
    echo "----------------------------------------------"
    echo ""
    echo ""
  } &>> program_options.log || true

    # If MultiQC is in programs_array, copy the config file to this directory.
    if [[ "${program}" == "multiqc" ]]; then
      cp --preserve ~/.multiqc_config.yaml multiqc_config.yaml
    fi
  done
  echo "Finished logging programs options."
  echo ""
fi


# Document programs in PATH (primarily for program version ID)
echo "Logging system $PATH..."
{
date
echo ""
echo "System PATH for $SLURM_JOB_ID"
echo ""
printf "%0.s-" {1..10}
echo "${PATH}" | tr : \\n
} >> system_path.log
echo "Finished logging system $PATH."
```

---

# RESULTS

This took a bit over 2 days to run. I'm not posting a screencap of runtime like I usually do because the job died due to the Mox `scrubbed` partition being full. I ran the remaining commands manually in order to get this done while that storage issue got resolved.

Output folder:

- [20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms](https://gannet.fish.washington.edu/Atumefaciens/20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms)

  - List of input FastQs and checksums (text):

    - [20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms/input_fastqs_checksums.md5](https://gannet.fish.washington.edu/Atumefaciens/20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms/input_fastqs_checksums.md5)

  - Merged GTF file (GTF; 37MB):

    - [20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms/Pver_genome_assembly_v1.0.stringtie.gtf](https://gannet.fish.washington.edu/Atumefaciens/20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms/Pver_genome_assembly_v1.0.stringtie.gtf)

  - Merged BAM file (95GB):

    - [20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms/20230216-pver-stringtie-pver_v1.0-sorted-bams-merged.bam](https://gannet.fish.washington.edu/Atumefaciens/20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms/20230216-pver-stringtie-pver_v1.0-sorted-bams-merged.bam)

      - MD5 checksum:

        - `6b57660fa1795fdb478850ad09b88b51`

    - Merged BAM index file (useful for IGV; 11MB):

      - [20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms/20230216-pver-stringtie-pver_v1.0-sorted-bams-merged.bam.bai](https://gannet.fish.washington.edu/Atumefaciens/20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms/20230216-pver-stringtie-pver_v1.0-sorted-bams-merged.bam.bai)

    - [`gffcompare`](https://ccb.jhu.edu/software/stringtie/gffcompare.shtml) files (text):

      - `Pver_genome_assembly_v1.0-gffcmp*`: See link above for file descriptions. Contents of summary file (`Pver_genome_assembly_v1.0-gffcmp`):

          ```
          # gffcompare v0.12.6 | Command line was:
          #/gscratch/srlab/programs/gffcompare-0.12.6.Linux_x86_64/gffcompare -r /gscratch/srlab/sam/data/P_verrucosa/genomes/Pver_genome_assembly_v1.0-valid.gff3 -o Pver_genome_assembly_v1.0-gffcmp Pver_genome_assembly_v1.0.stringtie.gtf
          #

          #= Summary for dataset: Pver_genome_assembly_v1.0.stringtie.gtf 
          #     Query mRNAs :   27730 in   27400 loci  (24124 multi-exon transcripts)
          #            (294 multi-transcript loci, ~1.0 transcripts per locus)
          # Reference mRNAs :   27695 in   27400 loci  (24124 multi-exon)
          # Super-loci w/ reference transcripts:    27400
          #-----------------| Sensitivity | Precision  |
                  Base level:   100.0     |   100.0    |
                  Exon level:   100.0     |   100.0    |
                Intron level:   100.0     |   100.0    |
          Intron chain level:   100.0     |   100.0    |
            Transcript level:   100.0     |    99.9    |
                Locus level:   100.0     |   100.0    |

              Matching intron chains:   24124
                Matching transcripts:   27695
                        Matching loci:   27400

                    Missed exons:       0/208892	(  0.0%)
                    Novel exons:       0/208892	(  0.0%)
                  Missed introns:       0/181193	(  0.0%)
                  Novel introns:       0/181193	(  0.0%)
                    Missed loci:       0/27400	(  0.0%)
                      Novel loci:       0/27400	(  0.0%)

          Total union super-loci across all input datasets: 27400 
          27730 out of 27730 consensus transcripts written in Pver_genome_assembly_v1.0-gffcmp.annotated.gtf (0 discarded as redundant)
          ```

Since there are a large number of folders/files, the resulting directory structure for all of the [`StringTie`](https://ccb.jhu.edu/software/stringtie/) output is shown at the end of this post. Here's a description of all the file types found in each directory:

- `*.ctab`: See [`ballgown` documentation](https://github.com/alyssafrazee/ballgown) for description of these.

- `*.checksums.md5`: MD5 checksums for all files in each directory.

- `*.cov_refs.gtf`: Coverage GTF generate by [`StringTie`](https://ccb.jhu.edu/software/stringtie/) and used to generate final GTF for each sample.

- `*.gtf`: Final GTF file produced by [`StringTie`](https://ccb.jhu.edu/software/stringtie/) for each sample.

- `*_hisat2.err`: Standard error output from [`HISAT2`](https://daehwankimlab.github.io/hisat2/). Contains alignment info.

- `*.sorted.bam`: Sorted BAM alignments file produced by [`HISAT2`](https://daehwankimlab.github.io/hisat2/).

- `*.sorted.bam.bai`: BAM index file.


Next up is to get these files imported into [`ballgown`](https://github.com/alyssafrazee/ballgown) for expression/isoform analyses.


```
[205G]  .
|-- [ 12K]  20230216-pver-stringtie-Pver_genome_assembly_v1.0-isoforms.sh
|-- [ 95G]  20230216-pver-stringtie-pver_v1.0-sorted-bams-merged.bam
|-- [ 11M]  20230216-pver-stringtie-pver_v1.0-sorted-bams-merged.bam.bai
|-- [3.0G]  C17
|   |-- [ 474]  C17_checksums.md5
|   |-- [7.8M]  C17.cov_refs.gtf
|   |-- [ 36M]  C17.gtf
|   |-- [1014]  C17-hisat2_stats.txt
|   |-- [3.0G]  C17.sorted.bam
|   |-- [1.7M]  C17.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.6G]  C18
|   |-- [ 474]  C18_checksums.md5
|   |-- [7.9M]  C18.cov_refs.gtf
|   |-- [ 36M]  C18.gtf
|   |-- [1011]  C18-hisat2_stats.txt
|   |-- [2.5G]  C18.sorted.bam
|   |-- [1.6M]  C18.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.8G]  C19
|   |-- [ 474]  C19_checksums.md5
|   |-- [7.8M]  C19.cov_refs.gtf
|   |-- [ 36M]  C19.gtf
|   |-- [1012]  C19-hisat2_stats.txt
|   |-- [2.7G]  C19.sorted.bam
|   |-- [1.6M]  C19.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.8G]  C20
|   |-- [ 474]  C20_checksums.md5
|   |-- [7.9M]  C20.cov_refs.gtf
|   |-- [ 36M]  C20.gtf
|   |-- [1011]  C20-hisat2_stats.txt
|   |-- [2.7G]  C20.sorted.bam
|   |-- [1.6M]  C20.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [1.9G]  C21
|   |-- [ 474]  C21_checksums.md5
|   |-- [7.4M]  C21.cov_refs.gtf
|   |-- [ 36M]  C21.gtf
|   |-- [1008]  C21-hisat2_stats.txt
|   |-- [1.9G]  C21.sorted.bam
|   |-- [1.4M]  C21.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.8G]  C22
|   |-- [ 474]  C22_checksums.md5
|   |-- [7.7M]  C22.cov_refs.gtf
|   |-- [ 36M]  C22.gtf
|   |-- [1014]  C22-hisat2_stats.txt
|   |-- [2.7G]  C22.sorted.bam
|   |-- [1.6M]  C22.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.4G]  C23
|   |-- [ 474]  C23_checksums.md5
|   |-- [7.5M]  C23.cov_refs.gtf
|   |-- [ 36M]  C23.gtf
|   |-- [1011]  C23-hisat2_stats.txt
|   |-- [2.3G]  C23.sorted.bam
|   |-- [1.5M]  C23.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.8G]  C24
|   |-- [ 474]  C24_checksums.md5
|   |-- [7.7M]  C24.cov_refs.gtf
|   |-- [ 36M]  C24.gtf
|   |-- [1014]  C24-hisat2_stats.txt
|   |-- [2.7G]  C24.sorted.bam
|   |-- [1.7M]  C24.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [3.6G]  C25
|   |-- [ 474]  C25_checksums.md5
|   |-- [8.2M]  C25.cov_refs.gtf
|   |-- [ 36M]  C25.gtf
|   |-- [1014]  C25-hisat2_stats.txt
|   |-- [3.5G]  C25.sorted.bam
|   |-- [1.8M]  C25.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.8G]  C26
|   |-- [ 474]  C26_checksums.md5
|   |-- [7.8M]  C26.cov_refs.gtf
|   |-- [ 36M]  C26.gtf
|   |-- [1014]  C26-hisat2_stats.txt
|   |-- [2.8G]  C26.sorted.bam
|   |-- [1.7M]  C26.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [3.0G]  C27
|   |-- [ 474]  C27_checksums.md5
|   |-- [7.9M]  C27.cov_refs.gtf
|   |-- [ 36M]  C27.gtf
|   |-- [1014]  C27-hisat2_stats.txt
|   |-- [2.9G]  C27.sorted.bam
|   |-- [1.7M]  C27.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.9G]  C28
|   |-- [ 474]  C28_checksums.md5
|   |-- [8.0M]  C28.cov_refs.gtf
|   |-- [ 36M]  C28.gtf
|   |-- [1014]  C28-hisat2_stats.txt
|   |-- [2.8G]  C28.sorted.bam
|   |-- [1.7M]  C28.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.2G]  C29
|   |-- [ 474]  C29_checksums.md5
|   |-- [7.6M]  C29.cov_refs.gtf
|   |-- [ 36M]  C29.gtf
|   |-- [1010]  C29-hisat2_stats.txt
|   |-- [2.1G]  C29.sorted.bam
|   |-- [1.6M]  C29.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [3.2G]  C30
|   |-- [ 474]  C30_checksums.md5
|   |-- [7.9M]  C30.cov_refs.gtf
|   |-- [ 36M]  C30.gtf
|   |-- [1014]  C30-hisat2_stats.txt
|   |-- [3.1G]  C30.sorted.bam
|   |-- [1.7M]  C30.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.8G]  C31
|   |-- [ 474]  C31_checksums.md5
|   |-- [7.9M]  C31.cov_refs.gtf
|   |-- [ 36M]  C31.gtf
|   |-- [1012]  C31-hisat2_stats.txt
|   |-- [2.7G]  C31.sorted.bam
|   |-- [1.6M]  C31.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [3.2G]  C32
|   |-- [ 474]  C32_checksums.md5
|   |-- [8.0M]  C32.cov_refs.gtf
|   |-- [ 36M]  C32.gtf
|   |-- [1014]  C32-hisat2_stats.txt
|   |-- [3.1G]  C32.sorted.bam
|   |-- [1.7M]  C32.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [1.1K]  checksums.md5
|-- [ 23G]  E1
|   |-- [ 469]  E1_checksums.md5
|   |-- [9.8M]  E1.cov_refs.gtf
|   |-- [ 36M]  E1.gtf
|   |-- [1022]  E1-hisat2_stats.txt
|   |-- [ 22G]  E1.sorted.bam
|   |-- [4.0M]  E1.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 19M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 11M]  i_data.ctab
|   `-- [2.9M]  t_data.ctab
|-- [3.5G]  E10
|   |-- [ 474]  E10_checksums.md5
|   |-- [8.0M]  E10.cov_refs.gtf
|   |-- [ 36M]  E10.gtf
|   |-- [1014]  E10-hisat2_stats.txt
|   |-- [3.4G]  E10.sorted.bam
|   |-- [1.8M]  E10.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.2G]  E11
|   |-- [ 474]  E11_checksums.md5
|   |-- [7.6M]  E11.cov_refs.gtf
|   |-- [ 36M]  E11.gtf
|   |-- [1008]  E11-hisat2_stats.txt
|   |-- [2.1G]  E11.sorted.bam
|   |-- [1.5M]  E11.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [3.4G]  E12
|   |-- [ 474]  E12_checksums.md5
|   |-- [8.0M]  E12.cov_refs.gtf
|   |-- [ 36M]  E12.gtf
|   |-- [1014]  E12-hisat2_stats.txt
|   |-- [3.3G]  E12.sorted.bam
|   |-- [1.7M]  E12.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [3.0G]  E13
|   |-- [ 474]  E13_checksums.md5
|   |-- [7.9M]  E13.cov_refs.gtf
|   |-- [ 36M]  E13.gtf
|   |-- [1014]  E13-hisat2_stats.txt
|   |-- [2.9G]  E13.sorted.bam
|   |-- [1.7M]  E13.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.6G]  E14
|   |-- [ 474]  E14_checksums.md5
|   |-- [7.8M]  E14.cov_refs.gtf
|   |-- [ 36M]  E14.gtf
|   |-- [1011]  E14-hisat2_stats.txt
|   |-- [2.5G]  E14.sorted.bam
|   |-- [1.6M]  E14.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.8G]  E15
|   |-- [ 474]  E15_checksums.md5
|   |-- [7.7M]  E15.cov_refs.gtf
|   |-- [ 36M]  E15.gtf
|   |-- [1013]  E15-hisat2_stats.txt
|   |-- [2.8G]  E15.sorted.bam
|   |-- [1.6M]  E15.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [3.3G]  E16
|   |-- [ 474]  E16_checksums.md5
|   |-- [8.1M]  E16.cov_refs.gtf
|   |-- [ 36M]  E16.gtf
|   |-- [1013]  E16-hisat2_stats.txt
|   |-- [3.2G]  E16.sorted.bam
|   |-- [1.7M]  E16.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.9G]  E2
|   |-- [ 469]  E2_checksums.md5
|   |-- [8.0M]  E2.cov_refs.gtf
|   |-- [ 36M]  E2.gtf
|   |-- [1014]  E2-hisat2_stats.txt
|   |-- [2.8G]  E2.sorted.bam
|   |-- [1.7M]  E2.sorted.bam.bai
|   |-- [2.4M]  e2t.ctab
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.9G]  E3
|   |-- [2.4M]  e2t.ctab
|   |-- [ 469]  E3_checksums.md5
|   |-- [7.9M]  E3.cov_refs.gtf
|   |-- [ 36M]  E3.gtf
|   |-- [1012]  E3-hisat2_stats.txt
|   |-- [2.8G]  E3.sorted.bam
|   |-- [1.6M]  E3.sorted.bam.bai
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.5G]  E4
|   |-- [2.4M]  e2t.ctab
|   |-- [ 469]  E4_checksums.md5
|   |-- [7.7M]  E4.cov_refs.gtf
|   |-- [ 36M]  E4.gtf
|   |-- [1011]  E4-hisat2_stats.txt
|   |-- [2.4G]  E4.sorted.bam
|   |-- [1.5M]  E4.sorted.bam.bai
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.0G]  E5
|   |-- [2.4M]  e2t.ctab
|   |-- [ 469]  E5_checksums.md5
|   |-- [7.5M]  E5.cov_refs.gtf
|   |-- [ 36M]  E5.gtf
|   |-- [1010]  E5-hisat2_stats.txt
|   |-- [2.0G]  E5.sorted.bam
|   |-- [1.5M]  E5.sorted.bam.bai
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.8G]  E6
|   |-- [2.4M]  e2t.ctab
|   |-- [ 469]  E6_checksums.md5
|   |-- [7.7M]  E6.cov_refs.gtf
|   |-- [ 36M]  E6.gtf
|   |-- [1011]  E6-hisat2_stats.txt
|   |-- [2.7G]  E6.sorted.bam
|   |-- [1.6M]  E6.sorted.bam.bai
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.7G]  E7
|   |-- [2.4M]  e2t.ctab
|   |-- [ 469]  E7_checksums.md5
|   |-- [8.0M]  E7.cov_refs.gtf
|   |-- [ 36M]  E7.gtf
|   |-- [1011]  E7-hisat2_stats.txt
|   |-- [2.7G]  E7.sorted.bam
|   |-- [1.6M]  E7.sorted.bam.bai
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [2.7G]  E8
|   |-- [2.4M]  e2t.ctab
|   |-- [ 469]  E8_checksums.md5
|   |-- [7.8M]  E8.cov_refs.gtf
|   |-- [ 36M]  E8.gtf
|   |-- [1011]  E8-hisat2_stats.txt
|   |-- [2.6G]  E8.sorted.bam
|   |-- [1.6M]  E8.sorted.bam.bai
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [3.1G]  E9
|   |-- [2.4M]  e2t.ctab
|   |-- [ 469]  E9_checksums.md5
|   |-- [7.8M]  E9.cov_refs.gtf
|   |-- [ 36M]  E9.gtf
|   |-- [1014]  E9-hisat2_stats.txt
|   |-- [3.0G]  E9.sorted.bam
|   |-- [1.7M]  E9.sorted.bam.bai
|   |-- [ 18M]  e_data.ctab
|   |-- [2.1M]  i2t.ctab
|   |-- [ 10M]  i_data.ctab
|   `-- [2.8M]  t_data.ctab
|-- [3.3K]  gtf_list.txt
|-- [8.4K]  input_fastqs_checksums.md5
|-- [1.5K]  Pver_genome_assembly_v1.0-gffcmp
|-- [ 33M]  Pver_genome_assembly_v1.0-gffcmp.annotated.gtf
|-- [2.6M]  Pver_genome_assembly_v1.0-gffcmp.loci
|-- [1.6M]  Pver_genome_assembly_v1.0-gffcmp.Pver_genome_assembly_v1.0.stringtie.gtf.refmap
|-- [3.1M]  Pver_genome_assembly_v1.0-gffcmp.Pver_genome_assembly_v1.0.stringtie.gtf.tmap
|-- [3.3M]  Pver_genome_assembly_v1.0-gffcmp.tracking
|-- [ 37M]  Pver_genome_assembly_v1.0.stringtie.gtf
|-- [ 87K]  slurm-4469382.out
`-- [ 654]  sorted_bams.list

 205G used in 32 directories, 367 files
 ```