--- title: "Proportion Test" author: "Yaamini Venkataraman" date: "1/15/2019" output: html_document --- I will use a proportion test to compare the proportion of genome feature overlaps between differentially methylated loci (DML), differentially methylated regions (DMR), and the gene background. ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # Install packages ```{r} #install.packages("RColorBrewer") require(RColorBrewer) #Load RColorBrewer #install.packages("dichromat") require(dichromat) ``` # Obtain session information ```{r} sessionInfo() ``` # Import data ```{r} overlapData <- read.csv("2019-01-15-Overlap-Proportions.csv", header = TRUE) rownames(overlapData) <- overlapData$genomicFeature #Set genomic feature indication and rownames overlapData <- overlapData[,-1] #Remove genomic feature indication column head(overlapData) #Confirm import ``` ## Reformat data ```{r} proportionData <- overlapData #Copy overlap data as a new dataframe nLength <- length(proportionData$totalCpG) #Count number of rows for(i in 1:nLength) { proportionData[i,] <- (proportionData[i,]/proportionData[6,])*100 } #Divide each column of proportionData by respective totalLines entry. Multiply by 100 and and save the percentage head(proportionData) #Confirm changes ``` ```{r} proportionData <- proportionData[-6,] #Remove totalLines row tail(proportionData) #Confirm changes ``` # DML ## Conduct chi-squared tests of homogeneity The null hypothesis is that loci distributions in the genome are the same between different categories. ### Total CpGs vs. all CpGs with 5x coverage enriched by MBD treatment I want to see if MBD enriched CpGs in certain areas of the genome. ```{r} totalVersusEnriched <- overlapData[, 1:2] #Subset data for statistical testing totalVersusEnriched <- totalVersusEnriched[-6, ] #Remove totalLines row head(totalVersusEnriched) #Confirm changes ``` ```{r} totalVersusEnrichedTest <- chisq.test(t(totalVersusEnriched)) #Conduct a chi-squared test totalVersusEnrichedTest #The distribution of enriched CpGs from MBD is different from the distribution of all CG motifs in the C. virginica genome ``` ```{r} totalVersusEnrichedTest$observed #Look at observed valules totalVersusEnrichedTest$expected #Look at expected values totalVersusEnrichedTest$residuals #Look at residuals ``` ### Total CpGs vs. methylated CpGs ```{r} totalVersusMethylated <- overlapData[, c(1,3)] #Subset data for statistical testing totalVersusMethylated <- totalVersusMethylated[-6, ] #Remove totalLines row head(totalVersusMethylated) #Confirm changes ``` ```{r} totalVersusMethylatedTest <- chisq.test(t(totalVersusMethylated)) #Conduct a chi-squared test totalVersusMethylatedTest #The distribution of methylated CpGs is significantly different from all CpG ``` ### Enriched CpGs vs. methylated CpGs Are methylated CpGs more populous in certain genomic regions? ```{r} enrichedVersusMethylated <- overlapData[, 2:3] #Subset data for statistical testing enrichedVersusMethylated <- enrichedVersusMethylated[-6, ] #Remove totalLines row head(enrichedVersusMethylated) #Confirm changes ``` ```{r} enrichedVersusMethylatedTest <- chisq.test(t(enrichedVersusMethylated)) #Conduct a chi-squared test enrichedVersusMethylatedTest #The distribution of methylated CpGs is significantly different from CpGs enriched by MBD ``` ### Enriched CpGs vs. methylated CpGs vs. sparsely methylated CpGs vs. unmethylated CpGs ```{r} allMethylationCategories <- overlapData[, 2:5] #Subset data for statistical testing allMethylationCategories <- allMethylationCategories[-6, ] #Remove totalLines row head(allMethylationCategories) #Confirm changes ``` ```{r} allMethylationCategoriesTest <- chisq.test(t(allMethylationCategories)) #Conduct a chi-squared test allMethylationCategoriesTest #The distributions differ from eachother. ``` ### Enriched CpGs vs. DML The MBD-Enriched loci are the background for DML. ```{r} enrichedVersusDML <- overlapData[,c(2,6)] #Subset data for statistical testing enrichedVersusDML <- enrichedVersusDML[-6,] #Remove totalLines row head(enrichedVersusDML) #Confirm changes ``` ```{r} enrichedVersusDMLTest <- chisq.test(t(enrichedVersusDML)) #Conduct a chi-squared test enrichedVersusDMLTest #The distribution of DML is significantly different from methylated CpGs. ``` ### Methylated CpGs vs. DML Even though the methylated CpGs are not the background of the DML, it's still an interesting comparison. ```{r} methylatedVersusDML <- overlapData[,c(3,6)] #Subset data for statistical testing methylatedVersusDML <- methylatedVersusDML[-6,] #Remove totalLines row head(methylatedVersusDML) #Confirm changes ``` ```{r} methylatedVersusDMLTest <- chisq.test(t(methylatedVersusDML)) #Conduct a chi-squared test methylatedVersusDMLTest #The distribution of DML is significantly different from methylated CpGs. ``` ## Create figures ### Define color scheme ```{r} RColorBrewer::display.brewer.all() #Show all RColorBrewer palettes. I will choose greens. plotColors <- rev(RColorBrewer::brewer.pal(5, "GnBu")) #Create a color palette for the barplots. Use 5 green-blue shades from RColorBrewer. Reverse theorder so the darkest shade is used first. barplot(t(t(proportionData)), col = plotColors) #See what plot looks like with new scheme barplot(t(t(proportionData)), col = dichromat(plotColors)) #Check that the plot colors will be easy to interpret for those with color blindess ``` ### All CpG types ```{r} allCpGCategories <- proportionData[,1:5] #Subset data head(allCpGCategories) #Confirm subset ``` ```{r} #pdf("2019-04-10-All-CpG-Categories.pdf", height = 8.5, width = 11) par(mar = c(3,5,1,1), oma = c(1, 1, 1, 10)) #Change figure boundaries barplot(t(t(allCpGCategories)), axes = FALSE, names.arg = c("All CpGs", "MBD-Enriched", "Methylated", "Sparsely Methylated", "Unmethylated"), ylim = c(0,100), col = plotColors) #Create a stacked barplot. Use axes = FALSE to remove the y-axis and names.arg to set labels on the x-axis. axis(side = 2, at = seq(0, 100, by = 10), las = 2, col = "grey80") mtext(side = 2, "Proportion CpGs", line = 3) #Add y-axis label par(fig = c(0, 1, 0, 1), oma = c(0, 0, 0, 0), mar = c(0, 0, 0, 0), new = TRUE) #Create new plot plot(0, 0, type = "n", bty = "n", xaxt = "n", yaxt = "n") #Add new plot on top of current plot legend("topright", xpd = TRUE, legend = c("Exons", "Introns", "Transposable Elements", "Putative Promoters", "Other"), pch = 22, col = "black", pt.bg = plotColors, bty = "n") #Place a legend in the top right of the figure with no box #dev.off() ``` ### Total CpGs vs. enriched CpGs ```{r} totalVersusEnrichedProportions <- proportionData[,1:2] #Subset data head(totalVersusEnrichedProportions) #Confirm subset ``` ```{r} #pdf("2019-04-10-Total-Versus-Enriched-CpGs.pdf", height = 8.5, width = 11) par(mar = c(3,5,1,1)) #Change figure boundaries barplot(t(totalVersusEnrichedProportions), beside = TRUE, axes = FALSE, names.arg = c("Exons", "Introns", "TE", "Promoters", "Other"), col = c(plotColors[2], plotColors[5]), ylim = c(0,50)) #Create a grouped barplot (beside = TRUE) using a transposed version of the proportion data. Use axes = FALSE to remove the y-axis and names.arg to set labels on the x-axis. axis(side = 2, at = seq(0, 100, by = 5), las = 2, col = "grey80") mtext(side = 2, "Proportion CpGs", line = 3) #Add y-axis label legend("topright", legend = c("All CpGs", "MBD-Enriched"), pch = 16, col = c(plotColors[2], plotColors[5]), bty = "n") #Place a legend in the top right of the figure with no box #dev.off() ``` ### All CpGs vs. Methylated CpGs ```{r} allCpGVersusMethylated <- proportionData[,c(1,3)] #Subset data head(allCpGVersusMethylated) #Confirm subset ``` ```{r} #pdf("2019-04-10-All-CpGs-Versus-Methylated-CpGs.pdf", height = 8.5, width = 11) par(mar = c(3,5,1,1), oma = c(1, 1, 1, 15)) #Change figure boundaries barplot(t(t(allCpGVersusMethylated)), axes = FALSE, names.arg = c("All CpGs", "Methylated CpGs"), cex.names = 1.5, ylim = c(0,100), col = plotColors) #Create a stacked barplot. Use axes = FALSE to remove the y-axis and names.arg to set labels on the x-axis. axis(side = 2, at = seq(0, 100, by = 10), las = 2, col = "grey80", cex.axis = 1.2) mtext(side = 2, "Proportion CpGs", line = 3, cex = 1.5) #Add y-axis label par(fig = c(0, 1, 0, 1), oma = c(0, 0, 0, 0), mar = c(0, 0, 0, 0), new = TRUE) #Create new plot plot(0, 0, type = "n", bty = "n", xaxt = "n", yaxt = "n") #Add new plot on top of current plot legend("topright", xpd = TRUE, legend = c("Exons", "Introns", "Transposable Elements", "Putative Promoters", "Other"), pch = 22, col = "black", pt.bg = plotColors, bty = "n", cex = 1.5) #Place a legend in the top right of the figure with no box #dev.off() ``` ### Enriched CpGs vs. methylated CpGs ```{r} enrichedVersusMethylatedProportions <- proportionData[,2:3] #Subset data head(enrichedVersusMethylatedProportions) #Confirm subset ``` ```{r} #pdf("2019-04-10-Total-Versus-Methylated-CpGs.pdf", height = 8.5, width = 11) par(mar = c(3,5,1,1)) #Change figure boundaries barplot(t(enrichedVersusMethylatedProportions), beside = TRUE, axes = FALSE, names.arg = c("Exons", "Introns", "TE", "Promoters", "Other"), col = c(plotColors[2], plotColors[5]), ylim = c(0,50)) #Create a grouped barplot (beside = TRUE) using a transposed version of the proportion data. Use axes = FALSE to remove the y-axis and names.arg to set labels on the x-axis. axis(side = 2, at = seq(0, 50, by = 5), las = 2, col = "grey80") mtext(side = 2, "Proportion CpGs", line = 3) #Add y-axis label legend("topright", legend = c("MBD-Enriched", "Methylated"), pch = 16, col = c(plotColors[2], plotColors[5]), bty = "n") #Place a legend in the top right of the figure with no box #dev.off() ``` ### Enriched CpGs vs. methylated vs. sparsely methylated vs. unmethylated ```{r} allMethylationCategoriesProportions <- proportionData[,2:5] #Subset data head(allMethylationCategoriesProportions) #Confirm subset ``` ```{r} #pdf("2019-04-10-All-Methylation-Categories.pdf", height = 8.5, width = 11) par(mar = c(3,5,1,1), oma = c(1, 1, 1, 10)) #Change figure boundaries barplot(t(t(allMethylationCategoriesProportions)), axes = FALSE, names.arg = c("MBD-Enriched", "Methylated", "Sparsely Methylated", "Unmethylated"), ylim = c(0,100), col = plotColors) #Create a stacked barplot. Use axes = FALSE to remove the y-axis and names.arg to set labels on the x-axis. axis(side = 2, at = seq(0, 100, by = 10), las = 2, col = "grey80") mtext(side = 2, "Proportion CpGs", line = 3) #Add y-axis label par(fig = c(0, 1, 0, 1), oma = c(0, 0, 0, 0), mar = c(0, 0, 0, 0), new = TRUE) #Create new plot plot(0, 0, type = "n", bty = "n", xaxt = "n", yaxt = "n") #Add new plot on top of current plot legend("topright", xpd = TRUE, legend = c("Exons", "Introns", "Transposable Elements", "Putative Promoters", "Other"), pch = 22, col = "black", pt.bg = plotColors, bty = "n") #Place a legend in the top right of the figure with no box #dev.off() ``` ### Enriched CpGs vs. DML ```{r} enrichedVersusDMLProportions <- proportionData[,c(2,6)] #Subset data head(enrichedVersusDMLProportions) #Confirm subset ``` ```{r} #pdf("2019-04-10-Enriched-Versus-DML.pdf", height = 8.5, width = 11) par(mar = c(3,5,1,1), oma = c(1, 1, 1, 15)) #Change figure boundaries barplot(t(t(enrichedVersusDMLProportions)), axes = FALSE, names.arg = c("Loci with 5x Coverage", "DML"), cex.names = 1.5, ylim = c(0,100), col = plotColors) #Create a stacked barplot. Use axes = FALSE to remove the y-axis and names.arg to set labels on the x-axis. axis(side = 2, at = seq(0, 100, by = 10), las = 2, col = "grey80", cex.axis = 1.2) #Add y-axis mtext(side = 2, "Proportion CpGs", line = 3, cex = 1.5) #Add y-axis label par(fig = c(0, 1, 0, 1), oma = c(0, 0, 0, 0), mar = c(0, 0, 0, 0), new = TRUE) #Create new plot plot(0, 0, type = "n", bty = "n", xaxt = "n", yaxt = "n") #Add new plot on top of current plot legend("topright", xpd = TRUE, legend = c("Exons", "Introns", "Transposable Elements", "Putative Promoters", "Other"), pch = 22, col = "black", pt.bg = plotColors, bty = "n", cex = 1.5) #Place a legend in the top right of the figure with no box. pch = 22 has a background and outline. col changes the outline, pt.bg changes the point fill #dev.off() ``` ### Methylated CpGs vs. DML ```{r} methylatedVersusDMLProportions <- proportionData[,c(3,6)] #Subset data head(methylatedVersusDMLProportions) #Confirm subset ``` ```{r} #pdf("2019-04-10-Methylated-CpGs-Versus-DML.pdf", height = 8.5, width = 11) par(mar = c(3,5,1,1)) #Change figure boundaries barplot(t(methylatedVersusDMLProportions), axes = FALSE, names.arg = c("Exons", "Introns", "TE", "Promoters", "Other"), ylim = c(0,65)) #Create a grouped barplot (beside = TRUE) using a transposed version of the proportion data. Use axes = FALSE to remove the y-axis and names.arg to set labels on the x-axis. axis(side = 2, at = seq(0, 65, by = 5), las = 2, col = "grey80") mtext(side = 2, "Proportion CpGs", line = 3) #Add y-axis label legend("topright", legend = c("Methylated CpGs", "DML"), pch = 16, col = c("grey20", "grey80"), bty = "n") #Place a legend in the top right of the figure with no box #dev.off() ``` # DMR ## Conduct chi-squared test ### 100 bp tile background vs. DMR The 100 bp tiles are every possible DMR, so this is the appropriate background. ```{r} backgroundVersusDMR <- overlapData[,c(7,8)] #Subset data for statistical testing backgroundVersusDMR <- backgroundVersusDMR[-6,] #Remove totalLines row head(backgroundVersusDMR) #Confirm changes ``` ```{r} backgroundVersusDMRTest <- chisq.test(t(backgroundVersusDMR)) #Conduct a chi-squared test backgroundVersusDMRTest #The distribution of DML is significantly different from methylated CpGs. ``` ## Create figures ### 100 bp tile background vs. DMR ```{r} backgroundVersusDMRProportions <- proportionData[,c(8,7)] #Subset data. Column 8 is the DMRBackground, and column 7 are the DMR. head(backgroundVersusDMRProportions) #Confirm subset ``` ```{r} #pdf("2019-06-05-Background-Versus-DMR.pdf", height = 8.5, width = 11) par(mar = c(3,5,1,1)) #Change figure boundaries barplot(t(backgroundVersusDMRProportions), beside = TRUE, axes = FALSE, names.arg = c("Exons", "Introns", "TE", "Promoters", "Other"), ylim = c(0,75)) #Create a grouped barplot (beside = TRUE) using a transposed version of the proportion data. Use axes = FALSE to remove the y-axis and names.arg to set labels on the x-axis. axis(side = 2, at = seq(0, 75, by = 5), las = 2, col = "grey80") mtext(side = 2, "Proportion 100 bp Regions", line = 3) #Add y-axis label legend("topright", legend = c("All 100 bp Regions", "DMR"), pch = 16, col = c("grey20", "grey80"), bty = "n") #Place a legend in the top right of the figure with no box #dev.off() ```