--- title: "00.21-E-Peve-BS-genome" author: "Sam White" date: "2025-01-02" output: bookdown::html_document2: theme: cosmo toc: true toc_float: true number_sections: true code_folding: show code_download: true github_document: toc: true number_sections: true html_document: theme: cosmo toc: true toc_float: true number_sections: true code_folding: show code_download: true bibliography: references.bib --- # Background This Rmd file will create a bisulfite-converted genome by, and for, Bismark [@krueger2011] using the `Porites_evermanni_v1.fa` file. The genome FastA was taken from the [Genoscop corals webpage](https://www.genoscope.cns.fr/corals/genomes.html). Due to large sizes of output files, the files cannot be sync'd to GitHub. As such, the output directories will be gzipped and available here: - [https://gannet.fish.washington.edu/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome.tar.gz](https://gannet.fish.washington.edu/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome.tar.gz) (1.5GB) - [https://gannet.fish.washington.edu/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome.tar.gz.md5](https://gannet.fish.washington.edu/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome.tar.gz.md5) - MD5: `5a0d4f699d7d46eb9f996e677841582a` # Inputs - Directory containing a FastA file with the file extension: .fa or .fasta (also ending in .gz). # Outputs - CT Conversion - Bowtie2 index files. - CT conversion FastA - GA conversion - Bowtie2 index files. - GA conversion FastA. ```{r setup, include=FALSE} library(knitr) knitr::opts_chunk$set( echo = TRUE, # Display code chunks eval = FALSE, # Evaluate code chunks warning = FALSE, # Hide warnings message = FALSE, # Hide messages comment = "" # Prevents appending '##' to beginning of lines in code output ) ``` # Create a Bash variables file This allows usage of Bash variables across R Markdown chunks. ```{r save-bash-variables-to-rvars-file, engine='bash', eval=TRUE} { echo "#### Assign Variables ####" echo "" echo "# Data directories" echo 'export timeseries_dir=/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular' echo 'export output_dir_top=${timeseries_dir}/E-Peve/data' echo 'export genome_dir=${timeseries_dir}/E-Peve/data' echo "" echo "# Paths to programs" echo 'export programs_dir="/home/shared"' echo 'export bismark_dir="${programs_dir}/Bismark-0.24.0"' echo 'export bowtie2_dir="${programs_dir}/bowtie2-2.4.4-linux-x86_64"' echo "" echo "# Set number of CPUs to use" echo 'export threads=20' echo "" echo "# Print formatting" echo 'export line="--------------------------------------------------------"' echo "" } > .bashvars cat .bashvars ``` # Bisfulite conversion ```{r bismark-genome-conversion, engine='bash', eval=TRUE} # Load bash variables into memory source .bashvars ${bismark_dir}/bismark_genome_preparation \ ${genome_dir} \ --parallel ${threads} \ --bowtie2 \ --path_to_aligner ${bowtie2_dir} \ 1> ${genome_dir}/Peve-bs-genome.stderr ``` ## Inpect BS output ```{r inspect-BS-output, engine='bash', eval=TRUE} # Load bash variables into memory source .bashvars tree -h ${genome_dir}/Bisulfite_Genome ``` ## Compress output folder ```{r compress-BS-directory, engine='bash', eval=TRUE} source .bashvars tar -czvf ${genome_dir}/Bisulfite_Genome.tar.gz ${genome_dir}/Bisulfite_Genome ``` ## Create MD5sum ```{r md5sum, engine='bash', eval=TRUE} source .bashvars cd ${genome_dir} md5sum Bisulfite_Genome.tar.gz | tee Bisulfite_Genome.tar.gz.md5 ``` # REFERENCES