--- title: "GeoduckJuvOAmbdseq" output: html_document --- This notebook was started running Rstudio server on emu- microsoft open R Analysis on Bismark alignments of 52 RRBS juvenile OA treated geoduck to v070 genome with score_min L,0,-1.2 I 60 Install methylkit on Srlab on Emu ```{bash, eval = FALSE} ###ran these commands from the terminal sudo chmod -R 777 /home/srlab/R/x86_64-pc-linux-gnu-library/3.5 sudo chmod -R 777 /opt/microsoft/ropen/3.5.1/lib64/R/library ``` ```{r} remove.packages("BiocInstaller", lib=.libPaths()) if (!requireNamespace("BiocManager")) install.packages("BiocManager") BiocManager::install() biocLite("methylKit") ``` Load libraries ```{r} library(methylKit) library(ggplot2) ``` ```{bash} pwd ``` Download sorted deduplicated files ```{bash} curl -O http://gannet.fish.washington.edu/metacarcinus/Pgenerosa/20181101/dedup_sorted_bam.tar.gz ``` unzip files ```{bash} ###I ran this from the command line, not in R, because in R it errored out and showed incomplete decompression. It was also much faster by command line tar xvzf /home/srlab/GitHub/Shelly_Pgenerosa/analyses/methylkit_JuviPgenr_allData/dedup_sorted_bam.tar.gz ``` ## Assessing coverage and percent methylation with methylkit Create file list for reading in Bismark alignments (deduplicated sorted bam files) ```{r} file.list=c(dir("~/GitHub/Shelly_Pgenerosa/analyses/methylkit_JuviPgenr_allData/dedup_sorted_bam/",pattern = "_dedup.sorted.bam")) file.list <- paste("~/GitHub/Shelly_Pgenerosa/analyses/methylkit_JuviPgenr_allData/dedup_sorted_bam/", file.list, sep = "") file.list <- as.list(file.list) ``` Read in treatment info for these samples ```{r} treatment <- read.csv("~/GitHub/Shelly_Pgenerosa/data/Treatment_info_CORRECTED.csv", header = TRUE) #subset dataframe to exclude lines with no info in them treatment <- treatment[1:52,] #create a list of lists (the format that methylkit expects) for sample ids li <- list() for (i in 1:length(treatment$sampleno)) li[[i]] = as.character(treatment$sampleno[i]) #order treatment data frame by treatment to be able to add replicate numbers to each sample treatment <- treatment[sort(treatment$treatment),] #create replicate numbers (1-4) for each treatment (13) treatment$replicate <- rep(1:4,13) ``` ```{bash} curl -O http://gannet.fish.washington.edu/metacarcinus/Pgenerosa/20181101/methylkit_20181126/myobj ``` Read methylation calls from Bismark alignments. Ran 15 min/sample ```{r, eval = FALSE} myobj <- processBismarkAln(location = file.list, sample.id = li, assembly = "v3", read.context="CpG", mincov=3, treatment = treatment$treatment) ``` ```{r} load("~/GitHub/Shelly_Pgenerosa/analyses/methylkit_JuviPgenr_allData/myobj") ``` calculate region coverage ```{r} ##started running this at 10am Thursday, Dec 6 and stopped it Tuesday Dec. 11 at 8:44am. tiles <- tileMethylCounts(myobj,win.size=1000,step.size=1000, mc.cores = 3) ``` find regions covered by all samples (This finds regions that are covered by two or more samples I believe, because the covereage has NAs for some samples.) ```{r} mmeth <- unite(tiles, min.per.group = 1L) ```