{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Genomic Location of DML" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, I will identify the genomic locations of [DML identified with `methylKit`](https://github.com/RobertsLab/project-oyster-oa/blob/master/code/Haws/04-methylKit.R). \n", "\n", "2. Create BEDfiles for DML\n", "4. Identify overlaps between pH- and ploidy-DML\n", "3. Characterize genomic locations for DML\n", "5. Identify overlaps between SNPs and DML" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Set working directory" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'/Users/yaaminivenkataraman/Documents/project-oyster-oa/code/Haws'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pwd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaaminivenkataraman/Documents/project-oyster-oa/analyses\n" ] } ], "source": [ "cd ../../analyses/" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "#mkdir Haws_07-DML-characterization" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaaminivenkataraman/Documents/project-oyster-oa/analyses/Haws_07-DML-characterization\n" ] } ], "source": [ "cd Haws_07-DML-characterization/" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/opt/homebrew/bin/intersectBed\r\n" ] } ], "source": [ "!which intersectBed" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "bedtoolsDirectory = \"/opt/homebrew/bin/\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Create BEDfiles for DML" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "My DML lists are `.csv` files. To identify genomic locations with `bedtools intersect`, I need BEDfiles." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ",chr,start,end,strand,pvalue,qvalue,meth.diff\r\n", "49125,NC_047559.1,5294172,5294174,*,6.81863140326384e-14,8.40451241428843e-08,40.2560083594566\r\n", "885150,NC_047560.1,65604843,65604845,*,3.34714016321879e-07,0.00946585966971852,49.4839101396478\r\n", "888648,NC_047560.1,66080783,66080785,*,2.24994610517064e-07,0.00775731411903371,-51.6483516483517\r\n", "923332,NC_047560.1,72583152,72583154,*,6.60249993674503e-23,1.62762259467936e-16,-40\r\n", "1008760,NC_047561.1,7843128,7843130,*,5.49971909095006e-08,0.0032137058072851,-26.3157894736842\r\n", "1035367,NC_047561.1,10147466,10147468,*,5.73605741393552e-08,0.0032137058072851,-30.4647676161919\r\n", "1035580,NC_047561.1,10166213,10166215,*,1.68763140575909e-09,0.000393983826907299,-29.1507066437723\r\n", "1047890,NC_047561.1,11783086,11783088,*,1.4461592764831e-09,0.000393983826907299,-44.1576698155646\r\n", "1103577,NC_047561.1,16521359,16521361,*,1.50728082250528e-09,0.000393983826907299,28.8444735692442\r\n" ] } ], "source": [ "#Look at csv file to determine what modifications need to be made\n", "#Column 2: chr, Column 3: start, Column 4: end, Column 8: meth.diff\n", "!head ../Haws_04-methylKit/DML/DML-pH-25-Cov5.csv" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "../Haws_04-methylKit/DML/DML-pH-25-Cov5.csv\r\n", "../Haws_04-methylKit/DML/DML-ploidy-25-Cov5.csv\r\n" ] } ], "source": [ "#Will use 25% meth diff cutoff for DML definition\n", "!find ../Haws_04-methylKit/DML/DML*25*" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "#Replace , with tabs\n", "#Remove extraneous quotes entries (can also be done in R)\n", "#Print chr, start, end, meth.diff\n", "#Remove header\n", "#Save as BEDfile\n", "\n", "for f in ../Haws_04-methylKit/DML/DML*25*\n", "do\n", " tr \",\" \"\\t\" < ${f} \\\n", " | awk '{print $2\"\\t\"$3\"\\t\"$4\"\\t\"$8}' \\\n", " | tail -n+2 \\\n", " > ${f}.bed\n", "done" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "#Move BEDfiles to current working directory\n", "mv ../Haws_04-methylKit/DML/*bed ." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> DML-pH-25-Cov5.csv.bed <==\r\n", "NC_047559.1\t5294172\t5294174\t40.2560083594566\r\n", "NC_047560.1\t65604843\t65604845\t49.4839101396478\r\n", "NC_047560.1\t66080783\t66080785\t-51.6483516483517\r\n", "NC_047560.1\t72583152\t72583154\t-40\r\n", "NC_047561.1\t7843128\t7843130\t-26.3157894736842\r\n", "NC_047561.1\t10147466\t10147468\t-30.4647676161919\r\n", "NC_047561.1\t10166213\t10166215\t-29.1507066437723\r\n", "NC_047561.1\t11783086\t11783088\t-44.1576698155646\r\n", "NC_047561.1\t16521359\t16521361\t28.8444735692442\r\n", "NC_047561.1\t19286180\t19286182\t-55.4137931034483\r\n", "\r\n", "==> DML-ploidy-25-Cov5.csv.bed <==\r\n", "NC_047559.1\t12799610\t12799612\t27.7297297297297\r\n", "NC_047559.1\t22468723\t22468725\t28.4117647058823\r\n", "NC_047559.1\t44801744\t44801746\t34.0988480118915\r\n", "NC_047559.1\t53732861\t53732863\t25.8426966292135\r\n", "NC_047561.1\t9365798\t9365800\t34.0129358830146\r\n", "NC_047561.1\t28489237\t28489239\t-25.6018518518519\r\n", "NC_047561.1\t40362698\t40362700\t29.4117647058824\r\n", "NC_047563.1\t39926052\t39926054\t42.6872058194266\r\n", "NC_047564.1\t23049738\t23049740\t29.2845880961766\r\n", "NC_047564.1\t24426622\t24426624\t-30.0865800865801\r\n" ] } ], "source": [ "!head *bed" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047561.1\t40362698\t40362700\t-31.0344827586207\n", " 1 DML-Cov5-Overlaps.bed\n" ] } ], "source": [ "#Find overlaps between pH- and ploidy-DML\n", "#Check head\n", "#Count number of overlapping DML\n", "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-pH-25-Cov5.csv.bed \\\n", "-b DML-ploidy-25-Cov5.csv.bed \\\n", "> DML-Cov5-Overlaps.bed\n", "!head DML-Cov5-Overlaps.bed\n", "!wc -l DML-Cov5-Overlaps.bed" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "I imported the BEDfiles into [this IGV session]() to visualize them." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## 2. SNP overlap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will now look at overlaps between DML and unique C->T SNPs. After quantifying the number of SNPs in each DML list, I'll remove them for downstream analyses." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2a. Create BEDfiles" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_001276.1\t12440\t.\tC\tT\r\n", "NC_001276.1\t7226\t.\tC\tT\r\n", "NC_047559.1\t10001065\t.\tC\tT\r\n", "NC_047559.1\t10001128\t.\tC\tT\r\n", "NC_047559.1\t1000226\t.\tC\tT\r\n", "NC_047559.1\t10004318\t.\tC\tT\r\n", "NC_047559.1\t100045\t.\tC\tT\r\n", "NC_047559.1\t10004558\t.\tC\tT\r\n", "NC_047559.1\t10005322\t.\tC\tT\r\n", "NC_047559.1\t10005684\t.\tC\tT\r\n" ] } ], "source": [ "!head /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.tab" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_001276.1\t12440\t12440\r\n", "NC_001276.1\t7226\t7226\r\n", "NC_047559.1\t10001065\t10001065\r\n", "NC_047559.1\t10001128\t10001128\r\n", "NC_047559.1\t1000226\t1000226\r\n", "NC_047559.1\t10004318\t10004318\r\n", "NC_047559.1\t100045\t100045\r\n", "NC_047559.1\t10004558\t10004558\r\n", "NC_047559.1\t10005322\t10005322\r\n", "NC_047559.1\t10005684\t10005684\r\n" ] } ], "source": [ "!awk '{print $1\"\\t\"$2\"\\t\"$2}' /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.tab \\\n", "> /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed\n", "!head /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2b. Overlaps with Unique C/T SNPs" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047560.1\t65604843\t65604845\t49.4839101396478\n", "NC_047561.1\t7843128\t7843130\t-26.3157894736842\n", "NC_047561.1\t10166213\t10166215\t-29.1507066437723\n", "NC_047561.1\t39008886\t39008888\t-35.8974358974359\n", "NC_047567.1\t15896903\t15896905\t-28.3455405508507\n", "NC_047568.1\t46593770\t46593772\t-26.1194029850746\n", " 6 DML-pH-25-Cov5-unique-CT-SNPs.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-pH-25-Cov5.csv.bed \\\n", "-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \\\n", "> DML-pH-25-Cov5-unique-CT-SNPs.bed\n", "!head DML-pH-25-Cov5-unique-CT-SNPs.bed\n", "!wc -l DML-pH-25-Cov5-unique-CT-SNPs.bed" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047559.1\t22468723\t22468725\t28.4117647058823\n", "NC_047559.1\t44801744\t44801746\t34.0988480118915\n", "NC_047561.1\t28489237\t28489239\t-25.6018518518519\n", "NC_047565.1\t11970715\t11970717\t46.6938636749958\n", "NC_047568.1\t46583284\t46583286\t-33.1582332761578\n", " 5 DML-ploidy-25-Cov5-unique-CT-SNPs.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-ploidy-25-Cov5.csv.bed \\\n", "-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \\\n", "> DML-ploidy-25-Cov5-unique-CT-SNPs.bed\n", "!head DML-ploidy-25-Cov5-unique-CT-SNPs.bed\n", "!wc -l DML-ploidy-25-Cov5-unique-CT-SNPs.bed" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 DML-Cov5-Overlaps-unique-CT-SNPs.bed\r\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-Cov5-Overlaps.bed \\\n", "-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \\\n", "> DML-Cov5-Overlaps-unique-CT-SNPs.bed\n", "!head DML-Cov5-Overlaps-unique-CT-SNPs.bed\n", "!wc -l DML-Cov5-Overlaps-unique-CT-SNPs.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2c. Remove C->T SNPs from DML lists" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047559.1\t5294172\t5294174\t40.2560083594566\n", "NC_047560.1\t66080783\t66080785\t-51.6483516483517\n", "NC_047560.1\t72583152\t72583154\t-40\n", "NC_047561.1\t10147466\t10147468\t-30.4647676161919\n", "NC_047561.1\t11783086\t11783088\t-44.1576698155646\n", "NC_047561.1\t16521359\t16521361\t28.8444735692442\n", "NC_047561.1\t19286180\t19286182\t-55.4137931034483\n", "NC_047561.1\t19545407\t19545409\t-41.4451612903226\n", "NC_047561.1\t21915577\t21915579\t46.9271523178808\n", "NC_047561.1\t31290734\t31290736\t-30.2791262135922\n", " 34 DML-pH-25-Cov5-NO-SNPs.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}subtractBed \\\n", "-a DML-pH-25-Cov5.csv.bed \\\n", "-b DML-pH-25-Cov5-unique-CT-SNPs.bed \\\n", "> DML-pH-25-Cov5-NO-SNPs.bed\n", "!head DML-pH-25-Cov5-NO-SNPs.bed\n", "!wc -l DML-pH-25-Cov5-NO-SNPs.bed" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047559.1\t12799610\t12799612\t27.7297297297297\n", "NC_047559.1\t53732861\t53732863\t25.8426966292135\n", "NC_047561.1\t9365798\t9365800\t34.0129358830146\n", "NC_047561.1\t40362698\t40362700\t29.4117647058824\n", "NC_047563.1\t39926052\t39926054\t42.6872058194266\n", "NC_047564.1\t23049738\t23049740\t29.2845880961766\n", "NC_047564.1\t24426622\t24426624\t-30.0865800865801\n", "NC_047564.1\t25380708\t25380710\t-40.1414677276746\n", "NC_047565.1\t10523508\t10523510\t38.0689469431726\n", "NC_047565.1\t13203393\t13203395\t41.1725955204216\n", " 24 DML-ploidy-25-Cov5-NO-SNPs.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}subtractBed \\\n", "-a DML-ploidy-25-Cov5.csv.bed \\\n", "-b DML-ploidy-25-Cov5-unique-CT-SNPs.bed \\\n", "> DML-ploidy-25-Cov5-NO-SNPs.bed\n", "!head DML-ploidy-25-Cov5-NO-SNPs.bed\n", "!wc -l DML-ploidy-25-Cov5-NO-SNPs.bed" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047561.1\t40362698\t40362700\t-31.0344827586207\n", " 1 DML-Cov5-Overlaps-NO-SNPs.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}subtractBed \\\n", "-a DML-Cov5-Overlaps.bed \\\n", "-b DML-Cov5-Overlaps-unique-CT-SNPs.bed \\\n", "> DML-Cov5-Overlaps-NO-SNPs.bed\n", "!head DML-Cov5-Overlaps-NO-SNPs.bed\n", "!wc -l DML-Cov5-Overlaps-NO-SNPs.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Characterize SNP-free DML lists" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 24\n", " 10\n" ] } ], "source": [ "#Count hypomethylated DML\n", "#Count hypermethylated DML\n", "!grep \"-\" DML-pH-25-Cov5-NO-SNPs.bed | wc -l\n", "!grep -v \"-\" DML-pH-25-Cov5-NO-SNPs.bed | wc -l" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 8\n", " 16\n" ] } ], "source": [ "#Count hypomethylated DML\n", "#Count hypermethylated DML\n", "!grep \"-\" DML-ploidy-25-Cov5-NO-SNPs.bed | wc -l\n", "!grep -v \"-\" DML-ploidy-25-Cov5-NO-SNPs.bed | wc -l" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Characterize genomic locations of DML" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will look at overlaps between genome features and either pH- or ploidy-DML." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4a. Gene" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### pH" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047559.1\t5294172\t5294174\t40.2560083594566\n", "NC_047560.1\t72583152\t72583154\t-40\n", "NC_047561.1\t10147466\t10147468\t-30.4647676161919\n", "NC_047561.1\t11783086\t11783088\t-44.1576698155646\n", "NC_047561.1\t16521359\t16521361\t28.8444735692442\n", "NC_047561.1\t19545407\t19545409\t-41.4451612903226\n", "NC_047561.1\t31290734\t31290736\t-30.2791262135922\n", "NC_047561.1\t40362698\t40362700\t-31.0344827586207\n", "NC_047561.1\t46808693\t46808695\t-27.2727272727273\n", "NC_047563.1\t11760749\t11760751\t-34.033180778032\n", " 28 DML-pH-25-Cov5-Gene.bed\n" ] } ], "source": [ "#Find overlaps between DML and feature\n", "#Look at output\n", "#Count number of overlaps\n", "\n", "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-pH-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \\\n", "> DML-pH-25-Cov5-Gene.bed\n", "!head DML-pH-25-Cov5-Gene.bed\n", "!wc -l DML-pH-25-Cov5-Gene.bed" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047559.1\t5294172\t5294174\t40.2560083594566\tNC_047559.1\tGnomon\tgene\t5232741\t5314657\t.\t+\t.\tID=gene-LOC105323223;Dbxref=GeneID:105323223;Name=LOC105323223;gbkey=Gene;gene=LOC105323223;gene_biotype=protein_coding\r\n", "NC_047560.1\t72583152\t72583154\t-40\tNC_047560.1\tGnomon\tgene\t72526541\t72603486\t.\t-\t.\tID=gene-LOC105330929;Dbxref=GeneID:105330929;Name=LOC105330929;gbkey=Gene;gene=LOC105330929;gene_biotype=protein_coding\r\n", "NC_047561.1\t10147466\t10147468\t-30.4647676161919\tNC_047561.1\tGnomon\tgene\t10126075\t10148544\t.\t+\t.\tID=gene-LOC105337008;Dbxref=GeneID:105337008;Name=LOC105337008;gbkey=Gene;gene=LOC105337008;gene_biotype=protein_coding\r\n", "NC_047561.1\t11783086\t11783088\t-44.1576698155646\tNC_047561.1\tGnomon\tgene\t11750567\t11834596\t.\t-\t.\tID=gene-LOC105346952;Dbxref=GeneID:105346952;Name=LOC105346952;gbkey=Gene;gene=LOC105346952;gene_biotype=protein_coding\r\n", "NC_047561.1\t16521359\t16521361\t28.8444735692442\tNC_047561.1\tGnomon\tgene\t16519780\t16543976\t.\t-\t.\tID=gene-LOC105345244;Dbxref=GeneID:105345244;Name=LOC105345244;gbkey=Gene;gene=LOC105345244;gene_biotype=protein_coding\r\n", "NC_047561.1\t19545407\t19545409\t-41.4451612903226\tNC_047561.1\tGnomon\tgene\t19544914\t19552612\t.\t-\t.\tID=gene-LOC105335660;Dbxref=GeneID:105335660;Name=LOC105335660;gbkey=Gene;gene=LOC105335660;gene_biotype=protein_coding\r\n", "NC_047561.1\t31290734\t31290736\t-30.2791262135922\tNC_047561.1\tGnomon\tgene\t31288648\t31293566\t.\t+\t.\tID=gene-LOC105346771;Dbxref=GeneID:105346771;Name=LOC105346771;gbkey=Gene;gene=LOC105346771;gene_biotype=protein_coding\r\n", "NC_047561.1\t40362698\t40362700\t-31.0344827586207\tNC_047561.1\tGnomon\tgene\t40358245\t40364606\t.\t+\t.\tID=gene-LOC105324542;Dbxref=GeneID:105324542;Name=LOC105324542;gbkey=Gene;gene=LOC105324542;gene_biotype=protein_coding\r\n", "NC_047561.1\t46808693\t46808695\t-27.2727272727273\tNC_047561.1\tGnomon\tgene\t46808330\t46820255\t.\t+\t.\tID=gene-LOC105321186;Dbxref=GeneID:105321186;Name=LOC105321186;gbkey=Gene;gene=LOC105321186;gene_biotype=protein_coding\r\n", "NC_047563.1\t11760749\t11760751\t-34.033180778032\tNC_047563.1\tGnomon\tgene\t11760153\t11763209\t.\t+\t.\tID=gene-LOC105334771;Dbxref=GeneID:105334771;Name=LOC105334771;gbkey=Gene;gene=LOC105334771;gene_biotype=protein_coding\r\n" ] } ], "source": [ "#Find overlaps between DML and genes\n", "#Include original entry from gene GFF for each overlap, which will be used in downstream enrichment analyses (wb)\n", "#Look at output. Do not count overlaps because there are likely redundant entries\n", "\n", "!{bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a DML-pH-25-Cov5-NO-SNPS.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \\\n", "> DML-pH-25-Cov5-Gene-wb.bed\n", "!head DML-pH-25-Cov5-Gene-wb.bed" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 29\r\n" ] } ], "source": [ "#Isolate column with gene IDs\n", "#Translate ; and = to tabs\n", "#Isolate column with gene IDs\n", "#Sort and identify unique gene IDs\n", "#Count the number of unique gene IDs that contain DML\n", "\n", "!cut -f13 DML-pH-25-Cov5-Gene-wb.bed \\\n", "| tr \";\" \"\\t\" \\\n", "| tr \"=\" \"\\t\" \\\n", "| cut -f6 \\\n", "| sort | uniq \\\n", "| wc -l" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LOC105323223\r\n", "LOC105330929\r\n", "LOC105337008\r\n", "LOC105346952\r\n", "LOC105345244\r\n", "LOC105335660\r\n", "LOC105346771\r\n", "LOC105324542\r\n", "LOC105321186\r\n", "LOC105334771\r\n" ] } ], "source": [ "#Isolate gene ID information and save\n", "\n", "!cut -f13 DML-pH-25-Cov5-Gene-wb.bed \\\n", "| tr \";\" \"\\t\" \\\n", "| tr \"=\" \"\\t\" \\\n", "| cut -f6 \\\n", "> geneID-pH-DML-overlap.tab\n", "!head geneID-pH-DML-overlap.tab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### ploidy" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047559.1\t12799610\t12799612\t27.7297297297297\n", "NC_047561.1\t9365798\t9365800\t34.0129358830146\n", "NC_047561.1\t40362698\t40362700\t29.4117647058824\n", "NC_047563.1\t39926052\t39926054\t42.6872058194266\n", "NC_047564.1\t23049738\t23049740\t29.2845880961766\n", "NC_047564.1\t24426622\t24426624\t-30.0865800865801\n", "NC_047564.1\t25380708\t25380710\t-40.1414677276746\n", "NC_047565.1\t10523508\t10523510\t38.0689469431726\n", "NC_047565.1\t13203393\t13203395\t41.1725955204216\n", "NC_047565.1\t14899959\t14899961\t32.5955265610438\n", " 20 DML-ploidy-25-Cov5-Gene.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-ploidy-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \\\n", "> DML-ploidy-25-Cov5-Gene.bed\n", "!head DML-ploidy-25-Cov5-Gene.bed\n", "!wc -l DML-ploidy-25-Cov5-Gene.bed" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047559.1\t12799610\t12799612\t27.7297297297297\tNC_047559.1\tGnomon\tgene\t12794201\t12802669\t.\t-\t.\tID=gene-LOC105348590;Dbxref=GeneID:105348590;Name=LOC105348590;gbkey=Gene;gene=LOC105348590;gene_biotype=protein_coding\r\n", "NC_047561.1\t9365798\t9365800\t34.0129358830146\tNC_047561.1\tGnomon\tgene\t9361078\t9371161\t.\t+\t.\tID=gene-LOC105331136;Dbxref=GeneID:105331136;Name=LOC105331136;gbkey=Gene;gene=LOC105331136;gene_biotype=protein_coding\r\n", "NC_047561.1\t40362698\t40362700\t29.4117647058824\tNC_047561.1\tGnomon\tgene\t40358245\t40364606\t.\t+\t.\tID=gene-LOC105324542;Dbxref=GeneID:105324542;Name=LOC105324542;gbkey=Gene;gene=LOC105324542;gene_biotype=protein_coding\r\n", "NC_047563.1\t39926052\t39926054\t42.6872058194266\tNC_047563.1\tGnomon\tgene\t39899519\t39927142\t.\t-\t.\tID=gene-LOC105326839;Dbxref=GeneID:105326839;Name=LOC105326839;gbkey=Gene;gene=LOC105326839;gene_biotype=protein_coding\r\n", "NC_047564.1\t23049738\t23049740\t29.2845880961766\tNC_047564.1\tGnomon\tgene\t23026724\t23059519\t.\t+\t.\tID=gene-LOC105337762;Dbxref=GeneID:105337762;Name=LOC105337762;gbkey=Gene;gene=LOC105337762;gene_biotype=protein_coding\r\n", "NC_047564.1\t24426622\t24426624\t-30.0865800865801\tNC_047564.1\tGnomon\tgene\t24422805\t24429598\t.\t-\t.\tID=gene-LOC105328665;Dbxref=GeneID:105328665;Name=LOC105328665;gbkey=Gene;gene=LOC105328665;gene_biotype=protein_coding\r\n", "NC_047564.1\t25380708\t25380710\t-40.1414677276746\tNC_047564.1\tGnomon\tgene\t25378564\t25382046\t.\t+\t.\tID=gene-LOC105317478;Dbxref=GeneID:105317478;Name=LOC105317478;gbkey=Gene;gene=LOC105317478;gene_biotype=protein_coding\r\n", "NC_047565.1\t10523508\t10523510\t38.0689469431726\tNC_047565.1\tGnomon\tgene\t10468342\t10543074\t.\t+\t.\tID=gene-LOC105320306;Dbxref=GeneID:105320306;Name=LOC105320306;gbkey=Gene;gene=LOC105320306;gene_biotype=protein_coding\r\n", "NC_047565.1\t13203393\t13203395\t41.1725955204216\tNC_047565.1\tGnomon\tgene\t13119330\t13220090\t.\t+\t.\tID=gene-LOC105329024;Dbxref=GeneID:105329024;Name=LOC105329024;gbkey=Gene;gene=LOC105329024;gene_biotype=protein_coding\r\n", "NC_047565.1\t14899959\t14899961\t32.5955265610438\tNC_047565.1\tGnomon\tgene\t14899770\t14913252\t.\t-\t.\tID=gene-LOC117681859;Dbxref=GeneID:117681859;Name=LOC117681859;gbkey=Gene;gene=LOC117681859;gene_biotype=protein_coding\r\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a DML-ploidy-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \\\n", "> DML-ploidy-25-Cov5-Gene-wb.bed\n", "!head DML-ploidy-25-Cov5-Gene-wb.bed" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 20\r\n" ] } ], "source": [ "#Isolate column with gene IDs\n", "#Translate ; and = to tabs\n", "#Isolate column with gene IDs\n", "#Sort and identify unique gene IDs\n", "#Count the number of unique gene IDs that contain DML\n", "\n", "!cut -f13 DML-ploidy-25-Cov5-Gene-wb.bed \\\n", "| tr \";\" \"\\t\" \\\n", "| tr \"=\" \"\\t\" \\\n", "| cut -f6 \\\n", "| sort | uniq \\\n", "| wc -l" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LOC105348590\r\n", "LOC105331136\r\n", "LOC105324542\r\n", "LOC105326839\r\n", "LOC105337762\r\n", "LOC105328665\r\n", "LOC105317478\r\n", "LOC105320306\r\n", "LOC105329024\r\n", "LOC117681859\r\n" ] } ], "source": [ "#Isolate gene ID information and save\n", "\n", "!cut -f13 DML-ploidy-25-Cov5-Gene-wb.bed \\\n", "| tr \";\" \"\\t\" \\\n", "| tr \"=\" \"\\t\" \\\n", "| cut -f6 \\\n", "> geneID-ploidy-DML-overlap.tab\n", "!head geneID-ploidy-DML-overlap.tab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### common" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047561.1\t40362698\t40362700\t-31.0344827586207\n", " 1 DML-Cov5-Overlaps-Gene.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-Cov5-Overlaps-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \\\n", "> DML-Cov5-Overlaps-Gene.bed\n", "!head DML-Cov5-Overlaps-Gene.bed\n", "!wc -l DML-Cov5-Overlaps-Gene.bed" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047561.1\t40362698\t40362700\t-31.0344827586207\tNC_047561.1\tGnomon\tgene\t40358245\t40364606\t.\t+\t.\tID=gene-LOC105324542;Dbxref=GeneID:105324542;Name=LOC105324542;gbkey=Gene;gene=LOC105324542;gene_biotype=protein_coding\r\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a DML-Cov5-Overlaps-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \\\n", "> DML-Cov5-Overlaps-Gene-wb.bed\n", "!head DML-Cov5-Overlaps-Gene-wb.bed" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1\r\n" ] } ], "source": [ "#Isolate column with gene IDs\n", "#Translate ; and = to tabs\n", "#Isolate column with gene IDs\n", "#Sort and identify unique gene IDs\n", "#Count the number of unique gene IDs that contain DML\n", "\n", "!cut -f13 DML-Cov5-Overlaps-Gene-wb.bed \\\n", "| tr \";\" \"\\t\" \\\n", "| tr \"=\" \"\\t\" \\\n", "| cut -f6 \\\n", "| sort | uniq \\\n", "| wc -l" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LOC105324542\r\n" ] } ], "source": [ "#Isolate gene ID information and save\n", "\n", "!cut -f13 DML-Cov5-Overlaps-Gene-wb.bed \\\n", "| tr \";\" \"\\t\" \\\n", "| tr \"=\" \"\\t\" \\\n", "| cut -f6 \\\n", "> geneID-Cov5-Overlaps-DML-overlap.tab\n", "!head geneID-Cov5-Overlaps-DML-overlap.tab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4b. Exon UTR" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047561.1\t10147466\t10147468\t-30.4647676161919\n", "NC_047563.1\t11760749\t11760751\t-34.033180778032\n", "NC_047564.1\t43801732\t43801734\t-26.7326732673267\n", "NC_047565.1\t4762558\t4762560\t-26.7316669176329\n", "NC_047566.1\t9548317\t9548319\t-34.3623481781376\n", " 5 DML-pH-25-Cov5-exonUTR.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-pH-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_exonUTR.gff \\\n", "> DML-pH-25-Cov5-exonUTR.bed\n", "!head DML-pH-25-Cov5-exonUTR.bed\n", "!wc -l DML-pH-25-Cov5-exonUTR.bed" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 DML-ploidy-25-Cov5-exonUTR.bed\r\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-ploidy-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_exonUTR.gff \\\n", "> DML-ploidy-25-Cov5-exonUTR.bed\n", "!head DML-ploidy-25-Cov5-exonUTR.bed\n", "!wc -l DML-ploidy-25-Cov5-exonUTR.bed" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 DML-Cov5-Overlaps-exonUTR.bed\r\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-Cov5-Overlaps-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_exonUTR.gff \\\n", "> DML-Cov5-Overlaps-exonUTR.bed\n", "!head DML-Cov5-Overlaps-exonUTR.bed\n", "!wc -l DML-Cov5-Overlaps-exonUTR.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4c. CDS" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047561.1\t11783086\t11783088\t-44.1576698155646\n", "NC_047561.1\t40362698\t40362700\t-31.0344827586207\n", "NC_047567.1\t22295946\t22295948\t-26.9118276501641\n", " 3 DML-pH-25-Cov5-CDS.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-pH-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_CDS.gff \\\n", "> DML-pH-25-Cov5-CDS.bed\n", "!head DML-pH-25-Cov5-CDS.bed\n", "!wc -l DML-pH-25-Cov5-CDS.bed" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047559.1\t12799610\t12799612\t27.7297297297297\n", "NC_047561.1\t40362698\t40362700\t29.4117647058824\n", "NC_047564.1\t23049738\t23049740\t29.2845880961766\n", "NC_047564.1\t24426622\t24426624\t-30.0865800865801\n", "NC_047566.1\t46447078\t46447080\t37.3155447746109\n", " 5 DML-ploidy-25-Cov5-CDS.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-ploidy-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_CDS.gff \\\n", "> DML-ploidy-25-Cov5-CDS.bed\n", "!head DML-ploidy-25-Cov5-CDS.bed\n", "!wc -l DML-ploidy-25-Cov5-CDS.bed" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047561.1\t40362698\t40362700\t-31.0344827586207\n", " 1 DML-Cov5-Overlaps-CDS.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-Cov5-Overlaps-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_CDS.gff \\\n", "> DML-Cov5-Overlaps-CDS.bed\n", "!head DML-Cov5-Overlaps-CDS.bed\n", "!wc -l DML-Cov5-Overlaps-CDS.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4d. Intron" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047559.1\t5294172\t5294174\t40.2560083594566\n", "NC_047560.1\t72583152\t72583154\t-40\n", "NC_047561.1\t16521359\t16521361\t28.8444735692442\n", "NC_047561.1\t19545407\t19545409\t-41.4451612903226\n", "NC_047561.1\t31290734\t31290736\t-30.2791262135922\n", "NC_047561.1\t46808693\t46808695\t-27.2727272727273\n", "NC_047563.1\t66794619\t66794621\t-29.651103651714\n", "NC_047564.1\t2678443\t2678445\t-45.6953642384106\n", "NC_047565.1\t10619872\t10619874\t-25.6880733944954\n", "NC_047565.1\t24575356\t24575358\t-28.0575539568345\n", " 20 DML-pH-25-Cov5-intron.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-pH-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_intron.bed \\\n", "> DML-pH-25-Cov5-intron.bed\n", "!head DML-pH-25-Cov5-intron.bed\n", "!wc -l DML-pH-25-Cov5-intron.bed" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047561.1\t9365798\t9365800\t34.0129358830146\n", "NC_047563.1\t39926052\t39926054\t42.6872058194266\n", "NC_047564.1\t25380708\t25380710\t-40.1414677276746\n", "NC_047565.1\t10523508\t10523510\t38.0689469431726\n", "NC_047565.1\t13203393\t13203395\t41.1725955204216\n", "NC_047565.1\t14899959\t14899961\t32.5955265610438\n", "NC_047566.1\t27129225\t27129227\t37.7269975786925\n", "NC_047566.1\t35988011\t35988013\t-53.0531425651507\n", "NC_047566.1\t46084094\t46084096\t-32.3234916559692\n", "NC_047566.1\t50117081\t50117083\t32.0492517222266\n", " 15 DML-ploidy-25-Cov5-intron.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-ploidy-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_intron.bed \\\n", "> DML-ploidy-25-Cov5-intron.bed\n", "!head DML-ploidy-25-Cov5-intron.bed\n", "!wc -l DML-ploidy-25-Cov5-intron.bed" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 DML-Cov5-Overlaps-intron.bed\r\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-Cov5-Overlaps-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_intron.bed \\\n", "> DML-Cov5-Overlaps-intron.bed\n", "!head DML-Cov5-Overlaps-intron.bed\n", "!wc -l DML-Cov5-Overlaps-intron.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4e. Upstream flanks" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 DML-pH-25-Cov5-upstream.bed\r\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-pH-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_upstream.gff \\\n", "> DML-pH-25-Cov5-upstream.bed\n", "!head DML-pH-25-Cov5-upstream.bed\n", "!wc -l DML-pH-25-Cov5-upstream.bed" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 DML-ploidy-25-Cov5-upstream.bed\r\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-ploidy-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_upstream.gff \\\n", "> DML-ploidy-25-Cov5-upstream.bed\n", "!head DML-ploidy-25-Cov5-upstream.bed\n", "!wc -l DML-ploidy-25-Cov5-upstream.bed" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 DML-Cov5-Overlaps-upstream.bed\r\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-Cov5-Overlaps-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_upstream.gff \\\n", "> DML-Cov5-Overlaps-upstream.bed\n", "!head DML-Cov5-Overlaps-upstream.bed\n", "!wc -l DML-Cov5-Overlaps-upstream.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4f. Downstream flanks" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047561.1\t19286180\t19286182\t-55.4137931034483\n", "NC_047561.1\t21915577\t21915579\t46.9271523178808\n", "NC_047567.1\t16984837\t16984839\t42.8241335044929\n", "NW_022994991.1\t19672\t19674\t36.769801980198\n", " 4 DML-pH-25-Cov5-downstream.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-pH-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_downstream.gff \\\n", "> DML-pH-25-Cov5-downstream.bed\n", "!head DML-pH-25-Cov5-downstream.bed\n", "!wc -l DML-pH-25-Cov5-downstream.bed" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047566.1\t24265305\t24265307\t-26.1261261261261\n", " 1 DML-ploidy-25-Cov5-downstream.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-ploidy-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_downstream.gff \\\n", "> DML-ploidy-25-Cov5-downstream.bed\n", "!head DML-ploidy-25-Cov5-downstream.bed\n", "!wc -l DML-ploidy-25-Cov5-downstream.bed" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 DML-Cov5-Overlaps-downstream.bed\r\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-Cov5-Overlaps-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_downstream.gff \\\n", "> DML-Cov5-Overlaps-downstream.bed\n", "!head DML-Cov5-Overlaps-downstream.bed\n", "!wc -l DML-Cov5-Overlaps-downstream.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4g. Intergenic regions" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047560.1\t66080783\t66080785\t-51.6483516483517\n", "NC_047565.1\t44521815\t44521817\t-30.3333333333333\n", " 2 DML-pH-25-Cov5-intergenic.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-pH-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_intergenic.bed \\\n", "> DML-pH-25-Cov5-intergenic.bed\n", "!head DML-pH-25-Cov5-intergenic.bed\n", "!wc -l DML-pH-25-Cov5-intergenic.bed" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047559.1\t53732861\t53732863\t25.8426966292135\n", "NC_047566.1\t24266096\t24266098\t-29.4736842105263\n", "NC_047566.1\t24266109\t24266111\t-27.7777777777778\n", " 3 DML-ploidy-25-Cov5-intergenic.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-ploidy-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_intergenic.bed \\\n", "> DML-ploidy-25-Cov5-intergenic.bed\n", "!head DML-ploidy-25-Cov5-intergenic.bed\n", "!wc -l DML-ploidy-25-Cov5-intergenic.bed" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 DML-Cov5-Overlaps-intergenic.bed\r\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-Cov5-Overlaps-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_intergenic.bed \\\n", "> DML-Cov5-Overlaps-intergenic.bed\n", "!head DML-Cov5-Overlaps-intergenic.bed\n", "!wc -l DML-Cov5-Overlaps-intergenic.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4h. lncRNA" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047564.1\t43801732\t43801734\t-26.7326732673267\n", "NC_047565.1\t44578741\t44578743\t-26.7896446913321\n", "NC_047566.1\t9548317\t9548319\t-34.3623481781376\n", " 3 DML-pH-25-Cov5-lncRNA.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-pH-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_lncRNA.gff \\\n", "> DML-pH-25-Cov5-lncRNA.bed\n", "!head DML-pH-25-Cov5-lncRNA.bed\n", "!wc -l DML-pH-25-Cov5-lncRNA.bed" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 DML-ploidy-25-Cov5-lncRNA.bed\r\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-ploidy-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_lncRNA.gff \\\n", "> DML-ploidy-25-Cov5-lncRNA.bed\n", "!head DML-ploidy-25-Cov5-lncRNA.bed\n", "!wc -l DML-ploidy-25-Cov5-lncRNA.bed" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 DML-Cov5-Overlaps-lncRNA.bed\r\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-Cov5-Overlaps-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_lncRNA.gff \\\n", "> DML-Cov5-Overlaps-lncRNA.bed\n", "!head DML-Cov5-Overlaps-lncRNA.bed\n", "!wc -l DML-Cov5-Overlaps-lncRNA.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4i. Tranposable elements" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047559.1\t5294172\t5294174\t40.2560083594566\n", "NC_047560.1\t66080783\t66080785\t-51.6483516483517\n", "NC_047561.1\t19286180\t19286182\t-55.4137931034483\n", "NC_047561.1\t21915577\t21915579\t46.9271523178808\n", "NC_047564.1\t2678443\t2678445\t-45.6953642384106\n", "NC_047565.1\t10619872\t10619874\t-25.6880733944954\n", "NC_047565.1\t44521815\t44521817\t-30.3333333333333\n", "NC_047565.1\t44578741\t44578743\t-26.7896446913321\n", "NC_047566.1\t23226898\t23226900\t25.3731343283582\n", "NC_047567.1\t16984837\t16984839\t42.8241335044929\n", " 15 DML-pH-25-Cov5-TE.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-pH-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_rm.te.bed \\\n", "> DML-pH-25-Cov5-TE.bed\n", "!head DML-pH-25-Cov5-TE.bed\n", "!wc -l DML-pH-25-Cov5-TE.bed" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_047559.1\t53732861\t53732863\t25.8426966292135\n", "NC_047561.1\t9365798\t9365800\t34.0129358830146\n", "NC_047563.1\t39926052\t39926054\t42.6872058194266\n", "NC_047566.1\t50117081\t50117083\t32.0492517222266\n", "NC_047566.1\t51204319\t51204321\t35.812086064308\n", "NC_047567.1\t21017447\t21017449\t34.8875423641779\n", " 6 DML-ploidy-25-Cov5-TE.bed\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-ploidy-25-Cov5-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_rm.te.bed \\\n", "> DML-ploidy-25-Cov5-TE.bed\n", "!head DML-ploidy-25-Cov5-TE.bed\n", "!wc -l DML-ploidy-25-Cov5-TE.bed" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 DML-Cov5-Overlaps-TE.bed\r\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a DML-Cov5-Overlaps-NO-SNPs.bed \\\n", "-b ../../genome-feature-files/cgigas_uk_roslin_v1_rm.te.bed \\\n", "> DML-Cov5-Overlaps-TE.bed\n", "!head DML-Cov5-Overlaps-TE.bed\n", "!wc -l DML-Cov5-Overlaps-TE.bed" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## 5. Obtain line counts for overlap files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This will help with downstream visualization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6a. pH-DML" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DML-pH-25-Cov5-CDS.bed\r\n", "DML-pH-25-Cov5-Gene-wb.bed\r\n", "DML-pH-25-Cov5-Gene.bed\r\n", "DML-pH-25-Cov5-NO-SNPs.bed\r\n", "DML-pH-25-Cov5-TE.bed\r\n", "DML-pH-25-Cov5-downstream.bed\r\n", "DML-pH-25-Cov5-exonUTR.bed\r\n", "DML-pH-25-Cov5-intergenic.bed\r\n", "DML-pH-25-Cov5-intron.bed\r\n", "DML-pH-25-Cov5-lncRNA.bed\r\n", "DML-pH-25-Cov5-unique-CT-SNPs.bed\r\n", "DML-pH-25-Cov5-upstream.bed\r\n", "DML-pH-25-Cov5.csv.bed\r\n" ] } ], "source": [ "!find DML-pH-25-*" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [], "source": [ "#Get line count for all DML overlap files\n", "#Remove the 13th line (total entries)\n", "#Remove 11th line (unique SNP overlaps)\n", "#Remove 4th line (true DML list)\n", "#Print in a tab-delimited format\n", "#Save output\n", "\n", "!wc -l DML-pH-25-* \\\n", "| sed '13,$ d' \\\n", "| sed '11d' \\\n", "| sed '4d' \\\n", "| awk '{print $1\"\\t\"$2}' \\\n", "> DML-pH-25-Overlap-counts.txt" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\tDML-pH-25-Cov5-CDS.bed\r\n", "31\tDML-pH-25-Cov5-Gene-wb.bed\r\n", "28\tDML-pH-25-Cov5-Gene.bed\r\n", "15\tDML-pH-25-Cov5-TE.bed\r\n", "4\tDML-pH-25-Cov5-downstream.bed\r\n", "5\tDML-pH-25-Cov5-exonUTR.bed\r\n", "2\tDML-pH-25-Cov5-intergenic.bed\r\n", "20\tDML-pH-25-Cov5-intron.bed\r\n", "3\tDML-pH-25-Cov5-lncRNA.bed\r\n", "0\tDML-pH-25-Cov5-upstream.bed\r\n" ] } ], "source": [ "!cat DML-pH-25-Overlap-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6b. ploidy" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DML-ploidy-25-Cov5-CDS.bed\r\n", "DML-ploidy-25-Cov5-Gene-wb.bed\r\n", "DML-ploidy-25-Cov5-Gene.bed\r\n", "DML-ploidy-25-Cov5-NO-SNPs.bed\r\n", "DML-ploidy-25-Cov5-TE.bed\r\n", "DML-ploidy-25-Cov5-downstream.bed\r\n", "DML-ploidy-25-Cov5-exonUTR.bed\r\n", "DML-ploidy-25-Cov5-intergenic.bed\r\n", "DML-ploidy-25-Cov5-intron.bed\r\n", "DML-ploidy-25-Cov5-lncRNA.bed\r\n", "DML-ploidy-25-Cov5-unique-CT-SNPs.bed\r\n", "DML-ploidy-25-Cov5-upstream.bed\r\n", "DML-ploidy-25-Cov5.csv.bed\r\n" ] } ], "source": [ "!find DML-ploidy-25-*" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [], "source": [ "#Get line count for all DML overlap files\n", "#Remove the 13th line (total entries)\n", "#Remove 11th line (unique SNP overlaps)\n", "#Remove 4th line (true DML list)\n", "#Print in a tab-delimited format\n", "#Save output\n", "\n", "!wc -l DML-ploidy-25-* \\\n", "| sed '13,$ d' \\\n", "| sed '11d' \\\n", "| sed '4d' \\\n", "| awk '{print $1\"\\t\"$2}' \\\n", "> DML-ploidy-25-Overlap-counts.txt" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5\tDML-ploidy-25-Cov5-CDS.bed\r\n", "20\tDML-ploidy-25-Cov5-Gene-wb.bed\r\n", "20\tDML-ploidy-25-Cov5-Gene.bed\r\n", "6\tDML-ploidy-25-Cov5-TE.bed\r\n", "1\tDML-ploidy-25-Cov5-downstream.bed\r\n", "0\tDML-ploidy-25-Cov5-exonUTR.bed\r\n", "3\tDML-ploidy-25-Cov5-intergenic.bed\r\n", "15\tDML-ploidy-25-Cov5-intron.bed\r\n", "0\tDML-ploidy-25-Cov5-lncRNA.bed\r\n", "0\tDML-ploidy-25-Cov5-upstream.bed\r\n" ] } ], "source": [ "!cat DML-ploidy-25-Overlap-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6c. common" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DML-Cov5-Overlaps-CDS.bed\r\n", "DML-Cov5-Overlaps-Gene-wb.bed\r\n", "DML-Cov5-Overlaps-Gene.bed\r\n", "DML-Cov5-Overlaps-NO-SNPs.bed\r\n", "DML-Cov5-Overlaps-TE.bed\r\n", "DML-Cov5-Overlaps-downstream.bed\r\n", "DML-Cov5-Overlaps-exonUTR.bed\r\n", "DML-Cov5-Overlaps-intergenic.bed\r\n", "DML-Cov5-Overlaps-intron.bed\r\n", "DML-Cov5-Overlaps-lncRNA.bed\r\n", "DML-Cov5-Overlaps-unique-CT-SNPs.bed\r\n", "DML-Cov5-Overlaps-upstream.bed\r\n" ] } ], "source": [ "!find DML-Cov5-Overlaps-*" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [], "source": [ "#Get line count for all DML overlap files\n", "#Remove the 13th line (total entries)\n", "#Remove 12th line (unique SNP overlaps)\n", "#Remove 4th line (true DML list)\n", "#Print in a tab-delimited format\n", "#Save output\n", "\n", "!wc -l DML-Cov5-Overlaps-* \\\n", "| sed '14,$ d' \\\n", "| sed '12d' \\\n", "| sed '4d' \\\n", "| awk '{print $1\"\\t\"$2}' \\\n", "> DML-Cov5-Overlaps-counts.txt" ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\tDML-Cov5-Overlaps-CDS.bed\r\n", "1\tDML-Cov5-Overlaps-Gene-wb.bed\r\n", "1\tDML-Cov5-Overlaps-Gene.bed\r\n", "0\tDML-Cov5-Overlaps-TE.bed\r\n", "0\tDML-Cov5-Overlaps-counts.txt\r\n", "0\tDML-Cov5-Overlaps-downstream.bed\r\n", "0\tDML-Cov5-Overlaps-exonUTR.bed\r\n", "0\tDML-Cov5-Overlaps-intergenic.bed\r\n", "0\tDML-Cov5-Overlaps-intron.bed\r\n", "0\tDML-Cov5-Overlaps-lncRNA.bed\r\n", "0\tDML-Cov5-Overlaps-upstream.bed\r\n" ] } ], "source": [ "!cat DML-Cov5-Overlaps-counts.txt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 1 }