--- title: "11-mnultispecies-RNASeq-trimming" output: html_document date: "2025-02-18" --- ```{bash} # Code chunk from Sam's shell script # Run fastp # Specifies reports in HTML and JSON formats /home/shared/fastp \ --in1 ${fastq_array_R1[index]} \ --in2 ${fastq_array_R2[index]} \ --detect_adapter_for_pe \ --thread ${threads} \ --html "${sample_name}".fastp-trim."${timestamp}".report.html \ --json "${sample_name}".fastp-trim."${timestamp}".report.json \ --out1 "${R1_sample_name}".fastp-trim."${timestamp}".fq.gz \ --out2 "${R2_sample_name}".fastp-trim."${timestamp}".fq.gz ``` ```{bash} # Set the directory containing FASTQ files FASTQ_DIR="/home/shared/8TB_HDD_02/graceac9/multispecies2023" THREADS=16 # Adjust as needed OUTDIR="../output/11-multi-fastp" # Loop through all R1 files in the directory for R1_FILE in ${FASTQ_DIR}/*_R1_001.fastq.gz; do # Derive corresponding R2 file name R2_FILE="${R1_FILE/_R1_001.fastq.gz/_R2_001.fastq.gz}" # Ensure the R2 file exists if [[ ! -f "$R2_FILE" ]]; then echo "Skipping ${R1_FILE}, no matching R2 file found." continue fi # Extract the sample name SAMPLE_NAME=$(basename "$R1_FILE" | sed 's/_R1_001.fastq.gz//') # Define output file names OUT_R1="${OUTDIR}/${SAMPLE_NAME}_R1.fastp-trim.fq.gz" OUT_R2="${OUTDIR}/${SAMPLE_NAME}_R2.fastp-trim.fq.gz" HTML_REPORT="${OUTDIR}/${SAMPLE_NAME}.fastp-trim.report.html" JSON_REPORT="${OUTDIR}/${SAMPLE_NAME}.fastp-trim.report.json" # Run fastp /home/shared/fastp --in1 "$R1_FILE" \ --in2 "$R2_FILE" \ --detect_adapter_for_pe \ --trim_front1 10 \ --trim_front2 10 \ --thread "$THREADS" \ --html "$HTML_REPORT" \ --json "$JSON_REPORT" \ --out1 "$OUT_R1" \ --out2 "$OUT_R2" echo "Finished processing: $SAMPLE_NAME" done echo "All samples processed." ``` ```{bash} # Set CPU threads to use threads=48 # Populate array with FastQ files fastq_array=(/home/shared/8TB_HDD_02/graceac9/GitHub/project-pycno-multispecies-2023/output/11-multi-fastp/*.fq.gz) # Pass array contents to new variable fastqc_list=$(echo "${fastq_array[*]}") # Run FastQC # NOTE: Do NOT quote ${fastqc_list} /home/shared/FastQC-0.12.1/fastqc \ --threads ${threads} \ --outdir /home/shared/8TB_HDD_02/graceac9/fastqc/trimmedmusp \ ${fastqc_list} ``` FastQC files are in: `/home/shared/8TB_HDD_02/graceac9/fastqc/trimmedmusp` In terminal in the Rproj, put: `eval "$(/opt/anaconda/anaconda3/bin/conda shell.bash hook)" conda activate` Then navigate into the directory: `/home/shared/8TB_HDD_02/graceac9/fastqc/trimmedmusp` and run in terminal: `multiqc .` The report will be generated in seconds... To view the report, transfer the html to owl or or gannet In terminal, while still in the directory where the fastqc report lives, run the following to `rsync` the file to the directory on owl: `rsync --archive --progress --verbose multiqc_report.html grace@owl.fish.washington.edu:/volume1/web/gcrandall/multispeciesSSWD/QCreports` The report now lives on OWL: http://owl.fish.washington.edu/gcrandall/multispeciesSSWD/QCreports/multiqc_report_trimmedRNAseqData.html * NOTE: In Owl, I renamed the multi-qc report to "multiqc_report_trimmedRNAseqData.html" because there will be another report in there from the trimmed data