Background
This Rmd file will create a bisulfite-converted genome by, and for, Bismark (Krueger and Andrews 2011) using the Apulchra-genome.fa
file. The genome FastA was taken from the deep-dive-expression Wiki (GitHub Wiki), but I’m not sure of its exact origin.
Due to large sizes of output files, the files cannot be sync’d to GitHub. As such, the output directories will be gzipped and available here:
Outputs
CT Conversion
- Bowtie2 index files.
- CT conversion FastA
GA conversion
- Bowtie2 index files.
- GA conversion FastA.
Create a Bash variables file
This allows usage of Bash variables across R Markdown chunks.
{
echo "#### Assign Variables ####"
echo ""
echo "# Data directories"
echo 'export timeseries_dir=/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular'
echo 'export output_dir_top=${timeseries_dir}/D-Apul/data'
echo 'export genome_dir=${timeseries_dir}/D-Apul/data'
echo ""
echo "# Paths to programs"
echo 'export programs_dir="/home/shared"'
echo 'export bismark_dir="${programs_dir}/Bismark-0.24.0"'
echo 'export bowtie2_dir="${programs_dir}/bowtie2-2.4.4-linux-x86_64"'
echo ""
echo "# Set number of CPUs to use"
echo 'export threads=20'
echo ""
echo "# Print formatting"
echo 'export line="--------------------------------------------------------"'
echo ""
} > .bashvars
cat .bashvars
#### Assign Variables ####
# Data directories
export timeseries_dir=/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular
export output_dir_top=${timeseries_dir}/D-Apul/data
export genome_dir=${timeseries_dir}/D-Apul/data
# Paths to programs
export programs_dir="/home/shared"
export bismark_dir="${programs_dir}/Bismark-0.24.0"
export bowtie2_dir="${programs_dir}/bowtie2-2.4.4-linux-x86_64"
# Set number of CPUs to use
export threads=20
# Print formatting
export line="--------------------------------------------------------"
Bisfulite conversion
# Load bash variables into memory
source .bashvars
${bismark_dir}/bismark_genome_preparation \
${genome_dir} \
--parallel ${threads} \
--bowtie2 \
--path_to_aligner ${bowtie2_dir}
Inpect BS output
# Load bash variables into memory
source .bashvars
tree -h ${genome_dir}/Bisulfite_Genome
/home/shared/8TB_HDD_01/sam/gitrepos/urol-e5/timeseries_molecular/D-Apul/data/Bisulfite_Genome
├── [4.0K] CT_conversion
│ ├── [169M] BS_CT.1.bt2
│ ├── [124M] BS_CT.2.bt2
│ ├── [1.5K] BS_CT.3.bt2
│ ├── [124M] BS_CT.4.bt2
│ ├── [169M] BS_CT.rev.1.bt2
│ ├── [124M] BS_CT.rev.2.bt2
│ └── [504M] genome_mfa.CT_conversion.fa
└── [4.0K] GA_conversion
├── [169M] BS_GA.1.bt2
├── [124M] BS_GA.2.bt2
├── [1.5K] BS_GA.3.bt2
├── [124M] BS_GA.4.bt2
├── [169M] BS_GA.rev.1.bt2
├── [124M] BS_GA.rev.2.bt2
└── [504M] genome_mfa.GA_conversion.fa
2 directories, 14 files
LS0tCnRpdGxlOiAiMDAuMjEtRC1BcHVsLUJTLWdlbm9tZSIKYXV0aG9yOiAiU2FtIFdoaXRlIgpkYXRlOiAiMjAyNC0wMS0wMiIKb3V0cHV0OiAKICBib29rZG93bjo6aHRtbF9kb2N1bWVudDI6CiAgICB0aGVtZTogY29zbW8KICAgIHRvYzogdHJ1ZQogICAgdG9jX2Zsb2F0OiB0cnVlCiAgICBudW1iZXJfc2VjdGlvbnM6IHRydWUKICAgIGNvZGVfZm9sZGluZzogc2hvdwogICAgY29kZV9kb3dubG9hZDogdHJ1ZQogIGdpdGh1Yl9kb2N1bWVudDoKICAgIHRvYzogdHJ1ZQogICAgbnVtYmVyX3NlY3Rpb25zOiB0cnVlCiAgaHRtbF9kb2N1bWVudDoKICAgIHRoZW1lOiBjb3NtbwogICAgdG9jOiB0cnVlCiAgICB0b2NfZmxvYXQ6IHRydWUKICAgIG51bWJlcl9zZWN0aW9uczogdHJ1ZQogICAgY29kZV9mb2xkaW5nOiBzaG93CiAgICBjb2RlX2Rvd25sb2FkOiB0cnVlCmJpYmxpb2dyYXBoeTogcmVmZXJlbmNlcy5iaWIKLS0tCgojIEJhY2tncm91bmQKClRoaXMgUm1kIGZpbGUgd2lsbCBjcmVhdGUgYSBiaXN1bGZpdGUtY29udmVydGVkIGdlbm9tZSBieSwgYW5kIGZvciwgQmlzbWFyayBbQGtydWVnZXIyMDExXSB1c2luZyB0aGUgYEFwdWxjaHJhLWdlbm9tZS5mYWAgZmlsZS4gVGhlIGdlbm9tZSBGYXN0QSB3YXMgdGFrZW4gZnJvbSB0aGUgW2RlZXAtZGl2ZS1leHByZXNzaW9uIFdpa2ldKGh0dHBzOi8vZ2l0aHViLmNvbS91cm9sLWU1L2RlZXAtZGl2ZS1leHByZXNzaW9uL3dpa2kvMDAtR2Vub21pYy1SZXNvdXJjZXMjYWNyb3BvcmEtcHVsY2hyYSkgKEdpdEh1YiBXaWtpKSwgYnV0IEknbSBub3Qgc3VyZSBvZiBpdHMgZXhhY3Qgb3JpZ2luLgoKRHVlIHRvIGxhcmdlIHNpemVzIG9mIG91dHB1dCBmaWxlcywgdGhlIGZpbGVzIGNhbm5vdCBiZSBzeW5jJ2QgdG8gR2l0SHViLiBBcyBzdWNoLCB0aGUgb3V0cHV0IGRpcmVjdG9yaWVzIHdpbGwgYmUgZ3ppcHBlZCBhbmQgYXZhaWxhYmxlIGhlcmU6CgotIFtodHRwczovL2dhbm5ldC5maXNoLndhc2hpbmd0b24uZWR1L2dpdHJlcG9zL3Vyb2wtZTUvdGltZXNlcmllc19tb2xlY3VsYXIvRC1BcHVsL2RhdGEvQXB1bGNocmEtZ2Vub21lLWJpc3VsZml0ZS50YXIuZ3pdKGh0dHBzOi8vZ2FubmV0LmZpc2gud2FzaGluZ3Rvbi5lZHUvZ2l0cmVwb3MvdXJvbC1lNS90aW1lc2VyaWVzX21vbGVjdWxhci9ELUFwdWwvZGF0YS9BcHVsY2hyYS1nZW5vbWUtYmlzdWxmaXRlLnRhci5neikgKDEuM0dCKQoKLSBbaHR0cHM6Ly9nYW5uZXQuZmlzaC53YXNoaW5ndG9uLmVkdS9naXRyZXBvcy91cm9sLWU1L3RpbWVzZXJpZXNfbW9sZWN1bGFyL0QtQXB1bC9kYXRhL0FwdWxjaHJhLWdlbm9tZS1iaXN1bGZpdGUudGFyLmd6Lm1kNV0oaHR0cHM6Ly9nYW5uZXQuZmlzaC53YXNoaW5ndG9uLmVkdS9naXRyZXBvcy91cm9sLWU1L3RpbWVzZXJpZXNfbW9sZWN1bGFyL0QtQXB1bC9kYXRhL0FwdWxjaHJhLWdlbm9tZS1iaXN1bGZpdGUudGFyLmd6Lm1kNSkKCiAgLSBNRDU6IGA5ZTFkYjU4NzVmMjEwMDA3YTQzZTIwODNmMDFjMmRiOWAKCiMgSW5wdXRzCgotIERpcmVjdG9yeSBjb250YWluaW5nIGEgRmFzdEEgZmlsZSB3aXRoIHRoZSBmaWxlIGV4dGVuc2lvbjogLmZhIG9yIC5mYXN0YSAoYWxzbyBlbmRpbmcgaW4gLmd6KS4KCiMgT3V0cHV0cwoKLSBDVCBDb252ZXJzaW9uCgogIC0gQm93dGllMiBpbmRleCBmaWxlcy4KICAtIENUIGNvbnZlcnNpb24gRmFzdEEKICAKLSBHQSBjb252ZXJzaW9uCgogIC0gQm93dGllMiBpbmRleCBmaWxlcy4KICAtIEdBIGNvbnZlcnNpb24gRmFzdEEuCgpgYGB7ciBzZXR1cCwgaW5jbHVkZT1GQUxTRX0KbGlicmFyeShrbml0cikKa25pdHI6Om9wdHNfY2h1bmskc2V0KAogIGVjaG8gPSBUUlVFLCAgICAgICAgICMgRGlzcGxheSBjb2RlIGNodW5rcwogIGV2YWwgPSBGQUxTRSwgICAgICAgICMgRXZhbHVhdGUgY29kZSBjaHVua3MKICB3YXJuaW5nID0gRkFMU0UsICAgICAjIEhpZGUgd2FybmluZ3MKICBtZXNzYWdlID0gRkFMU0UsICAgICAjIEhpZGUgbWVzc2FnZXMKICBjb21tZW50ID0gIiIgICAgICAgICAjIFByZXZlbnRzIGFwcGVuZGluZyAnIyMnIHRvIGJlZ2lubmluZyBvZiBsaW5lcyBpbiBjb2RlIG91dHB1dAopCmBgYAoKIyBDcmVhdGUgYSBCYXNoIHZhcmlhYmxlcyBmaWxlCgpUaGlzIGFsbG93cyB1c2FnZSBvZiBCYXNoIHZhcmlhYmxlcyBhY3Jvc3MgUiBNYXJrZG93biBjaHVua3MuCgpgYGB7ciBzYXZlLWJhc2gtdmFyaWFibGVzLXRvLXJ2YXJzLWZpbGUsIGVuZ2luZT0nYmFzaCcsIGV2YWw9VFJVRX0KewplY2hvICIjIyMjIEFzc2lnbiBWYXJpYWJsZXMgIyMjIyIKZWNobyAiIgoKZWNobyAiIyBEYXRhIGRpcmVjdG9yaWVzIgplY2hvICdleHBvcnQgdGltZXNlcmllc19kaXI9L2hvbWUvc2hhcmVkLzhUQl9IRERfMDEvc2FtL2dpdHJlcG9zL3Vyb2wtZTUvdGltZXNlcmllc19tb2xlY3VsYXInCmVjaG8gJ2V4cG9ydCBvdXRwdXRfZGlyX3RvcD0ke3RpbWVzZXJpZXNfZGlyfS9ELUFwdWwvZGF0YScKZWNobyAnZXhwb3J0IGdlbm9tZV9kaXI9JHt0aW1lc2VyaWVzX2Rpcn0vRC1BcHVsL2RhdGEnCmVjaG8gIiIKCmVjaG8gIiMgUGF0aHMgdG8gcHJvZ3JhbXMiCmVjaG8gJ2V4cG9ydCBwcm9ncmFtc19kaXI9Ii9ob21lL3NoYXJlZCInCmVjaG8gJ2V4cG9ydCBiaXNtYXJrX2Rpcj0iJHtwcm9ncmFtc19kaXJ9L0Jpc21hcmstMC4yNC4wIicKZWNobyAnZXhwb3J0IGJvd3RpZTJfZGlyPSIke3Byb2dyYW1zX2Rpcn0vYm93dGllMi0yLjQuNC1saW51eC14ODZfNjQiJwplY2hvICIiCgplY2hvICIjIFNldCBudW1iZXIgb2YgQ1BVcyB0byB1c2UiCmVjaG8gJ2V4cG9ydCB0aHJlYWRzPTIwJwplY2hvICIiCgplY2hvICIjIFByaW50IGZvcm1hdHRpbmciCmVjaG8gJ2V4cG9ydCBsaW5lPSItLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLSInCmVjaG8gIiIKfSA+IC5iYXNodmFycwoKY2F0IC5iYXNodmFycwpgYGAKCiMgQmlzZnVsaXRlIGNvbnZlcnNpb24KCmBgYHtyIGJpc21hcmstZ2Vub21lLWNvbnZlcnNpb24sIGVuZ2luZT0nYmFzaCcsIGV2YWw9RkFMU0V9CiMgTG9hZCBiYXNoIHZhcmlhYmxlcyBpbnRvIG1lbW9yeQpzb3VyY2UgLmJhc2h2YXJzCgoke2Jpc21hcmtfZGlyfS9iaXNtYXJrX2dlbm9tZV9wcmVwYXJhdGlvbiBcCiR7Z2Vub21lX2Rpcn0gXAotLXBhcmFsbGVsICR7dGhyZWFkc30gXAotLWJvd3RpZTIgXAotLXBhdGhfdG9fYWxpZ25lciAke2Jvd3RpZTJfZGlyfQpgYGAKCiMgSW5wZWN0IEJTIG91dHB1dApgYGB7ciBpbnNwZWN0LUJTLW91dHB1dCwgZW5naW5lPSdiYXNoJywgZXZhbD1UUlVFfQojIExvYWQgYmFzaCB2YXJpYWJsZXMgaW50byBtZW1vcnkKc291cmNlIC5iYXNodmFycwoKdHJlZSAtaCAke2dlbm9tZV9kaXJ9L0Jpc3VsZml0ZV9HZW5vbWUKYGBgCgojIFJFRkVSRU5DRVMKCg==