00.10-E-Peve-BS-genome
================
Sam White
2025-01-02
- 1 Background
- 2 Inputs
- 3 Outputs
- 4 Create a Bash variables
file
- 5
Bisfulite conversion
- 5.1 Inpect
BS output
- 5.2 Compress output folder
- 5.3 Create
MD5sum
- 6 REFERENCES
# 1 Background
This Rmd file will create a bisulfite-converted genome by, and for,
Bismark (Krueger and Andrews 2011) using the `Porites_evermanni_v1.fa`
file. The genome FastA was taken from the [Genoscop corals
webpage](https://www.genoscope.cns.fr/corals/genomes.html).
Due to large sizes of output files, the files cannot be sync’d to
GitHub. As such, the output directories will be gzipped and available
here:
-
(1.5GB)
-
- MD5: `5a0d4f699d7d46eb9f996e677841582a`
# 2 Inputs
- Directory containing a FastA file with the file extension: .fa or
.fasta (also ending in .gz).
# 3 Outputs
- CT Conversion
- Bowtie2 index files.
- CT conversion FastA
- GA conversion
- Bowtie2 index files.
- GA conversion FastA.
# 4 Create a Bash variables file
This allows usage of Bash variables across R Markdown chunks.
``` bash
{
echo "#### Assign Variables ####"
echo ""
echo "# Data directories"
echo 'export timeseries_dir=/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular'
echo 'export output_dir_top=${timeseries_dir}/E-Peve/data'
echo 'export genome_dir=${timeseries_dir}/E-Peve/data'
echo ""
echo "# Paths to programs"
echo 'export programs_dir="/home/shared"'
echo 'export bismark_dir="${programs_dir}/Bismark-0.24.0"'
echo 'export bowtie2_dir="${programs_dir}/bowtie2-2.4.4-linux-x86_64"'
echo ""
echo "# Set number of CPUs to use"
echo 'export threads=20'
echo ""
echo "# Print formatting"
echo 'export line="--------------------------------------------------------"'
echo ""
} > .bashvars
cat .bashvars
```
#### Assign Variables ####
# Data directories
export timeseries_dir=/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular
export output_dir_top=${timeseries_dir}/E-Peve/data
export genome_dir=${timeseries_dir}/E-Peve/data
# Paths to programs
export programs_dir="/home/shared"
export bismark_dir="${programs_dir}/Bismark-0.24.0"
export bowtie2_dir="${programs_dir}/bowtie2-2.4.4-linux-x86_64"
# Set number of CPUs to use
export threads=20
# Print formatting
export line="--------------------------------------------------------"
# 5 Bisfulite conversion
``` bash
# Load bash variables into memory
source .bashvars
${bismark_dir}/bismark_genome_preparation \
${genome_dir} \
--parallel ${threads} \
--bowtie2 \
--path_to_aligner ${bowtie2_dir} \
1> ${genome_dir}/Peve-bs-genome.stderr
```
Using 20 threads for the top and bottom strand indexing processes each, so using 40 cores in total
Writing bisulfite genomes out into a single MFA (multi FastA) file
Bisulfite Genome Indexer version v0.24.0 (last modified: 19 May 2022)
Step I - Prepare genome folders - completed
Step II - Genome bisulfite conversions - completed
Bismark Genome Preparation - Step III: Launching the Bowtie 2 indexer
Building a SMALL index
Building a SMALL index
=========================================
Parallel genome indexing complete. Enjoy!
## 5.1 Inpect BS output
``` bash
# Load bash variables into memory
source .bashvars
tree -h ${genome_dir}/Bisulfite_Genome
```
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome
├── [4.0K] CT_conversion
│ ├── [184M] BS_CT.1.bt2
│ ├── [134M] BS_CT.2.bt2
│ ├── [442K] BS_CT.3.bt2
│ ├── [134M] BS_CT.4.bt2
│ ├── [184M] BS_CT.rev.1.bt2
│ ├── [134M] BS_CT.rev.2.bt2
│ └── [586M] genome_mfa.CT_conversion.fa
└── [4.0K] GA_conversion
├── [184M] BS_GA.1.bt2
├── [134M] BS_GA.2.bt2
├── [442K] BS_GA.3.bt2
├── [134M] BS_GA.4.bt2
├── [184M] BS_GA.rev.1.bt2
├── [134M] BS_GA.rev.2.bt2
└── [586M] genome_mfa.GA_conversion.fa
2 directories, 14 files
## 5.2 Compress output folder
``` bash
source .bashvars
tar -czvf ${genome_dir}/Bisulfite_Genome.tar.gz ${genome_dir}/Bisulfite_Genome
```
tar: Removing leading `/' from member names
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/CT_conversion/
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/CT_conversion/BS_CT.4.bt2
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/CT_conversion/BS_CT.rev.1.bt2
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/CT_conversion/genome_mfa.CT_conversion.fa
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/CT_conversion/BS_CT.1.bt2
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/CT_conversion/BS_CT.2.bt2
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/CT_conversion/BS_CT.3.bt2
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/CT_conversion/BS_CT.rev.2.bt2
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/GA_conversion/
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/GA_conversion/BS_GA.rev.1.bt2
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/GA_conversion/BS_GA.2.bt2
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/GA_conversion/BS_GA.4.bt2
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/GA_conversion/genome_mfa.GA_conversion.fa
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/GA_conversion/BS_GA.rev.2.bt2
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/GA_conversion/BS_GA.1.bt2
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/E-Peve/data/Bisulfite_Genome/GA_conversion/BS_GA.3.bt2
## 5.3 Create MD5sum
``` bash
source .bashvars
cd ${genome_dir}
md5sum Bisulfite_Genome.tar.gz | tee Bisulfite_Genome.tar.gz.md5
```
1f65833895ba3c8d50fe27bb8ad5303e Bisulfite_Genome.tar.gz
# 6 REFERENCES
Krueger, Felix, and Simon R. Andrews. 2011. “Bismark: A Flexible Aligner
and Methylation Caller for Bisulfite-Seq Applications.” *Bioinformatics*
27 (11): 1571–72. .