--- title: "00.21-D-Apul-BS-genome" author: "Sam White" date: "2025-01-02" output: bookdown::html_document2: theme: cosmo toc: true toc_float: true number_sections: true code_folding: show code_download: true github_document: toc: true number_sections: true html_document: theme: cosmo toc: true toc_float: true number_sections: true code_folding: show code_download: true bibliography: references.bib --- # Background This Rmd file will create a bisulfite-converted genome by, and for, Bismark [@krueger2011] using the `Apulchra-genome.fa` file. The genome FastA was taken from the [deep-dive-expression Wiki](https://github.com/urol-e5/deep-dive-expression/wiki/00-Genomic-Resources#acropora-pulchra) (GitHub Wiki), but I'm not sure of its exact origin. Due to large sizes of output files, the files cannot be sync'd to GitHub. As such, the output directories will be gzipped and available here: - [https://gannet.fish.washington.edu/gitrepos/urol-e5/timeseries_molecular/D-Apul/data/Apulchra-genome-bisulfite.tar.gz](https://gannet.fish.washington.edu/gitrepos/urol-e5/timeseries_molecular/D-Apul/data/Apulchra-genome-bisulfite.tar.gz) (1.3GB) - [https://gannet.fish.washington.edu/gitrepos/urol-e5/timeseries_molecular/D-Apul/data/Apulchra-genome-bisulfite.tar.gz.md5](https://gannet.fish.washington.edu/gitrepos/urol-e5/timeseries_molecular/D-Apul/data/Apulchra-genome-bisulfite.tar.gz.md5) - MD5: `9e1db5875f210007a43e2083f01c2db9` # Inputs - Directory containing a FastA file with the file extension: .fa or .fasta (also ending in .gz). # Outputs - CT Conversion - Bowtie2 index files. - CT conversion FastA - GA conversion - Bowtie2 index files. - GA conversion FastA. ```{r setup, include=FALSE} library(knitr) knitr::opts_chunk$set( echo = TRUE, # Display code chunks eval = FALSE, # Evaluate code chunks warning = FALSE, # Hide warnings message = FALSE, # Hide messages comment = "" # Prevents appending '##' to beginning of lines in code output ) ``` # Create a Bash variables file This allows usage of Bash variables across R Markdown chunks. ```{r save-bash-variables-to-rvars-file, engine='bash', eval=TRUE} { echo "#### Assign Variables ####" echo "" echo "# Data directories" echo 'export timeseries_dir=/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular' echo 'export output_dir_top=${timeseries_dir}/D-Apul/data' echo 'export genome_dir=${timeseries_dir}/D-Apul/data' echo "" echo "# Paths to programs" echo 'export programs_dir="/home/shared"' echo 'export bismark_dir="${programs_dir}/Bismark-0.24.0"' echo 'export bowtie2_dir="${programs_dir}/bowtie2-2.4.4-linux-x86_64"' echo "" echo "# Set number of CPUs to use" echo 'export threads=20' echo "" echo "# Print formatting" echo 'export line="--------------------------------------------------------"' echo "" } > .bashvars cat .bashvars ``` # Bisfulite conversion ```{r bismark-genome-conversion, engine='bash', eval=FALSE} # Load bash variables into memory source .bashvars ${bismark_dir}/bismark_genome_preparation \ ${genome_dir} \ --parallel ${threads} \ --bowtie2 \ --path_to_aligner ${bowtie2_dir} ``` # Inpect BS output ```{r inspect-BS-output, engine='bash', eval=TRUE} # Load bash variables into memory source .bashvars tree -h ${genome_dir}/Bisulfite_Genome ``` # REFERENCES