{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# DML and DMR Analysis\n", "\n", "In this notebook, I will examine the location of differentially methylated loci (DML) and regions (DMR) in the *C. virginica* genome. The DML and DMR were identified using methylKit in [this R script](https://github.com/fish546-2018/yaamini-virginica/tree/master/analyses/2018-10-25-MethylKit).\n", "\n", "Methods:\n", "\n", "1. Prepare for Analyses\n", "2. Locate Files and Set Variable Paths\n", "3. Identify Overlaps between Genomic Feature Tracks\n", "4. Gene Flanking" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Prepare for Analyses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 0a. Set Working Directory" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'/Users/yaamini/Documents/yaamini-virginica/notebooks'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pwd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaamini/Documents/yaamini-virginica/analyses\n" ] } ], "source": [ "cd ../analyses/" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#!mkdir 2018-11-01-DML-and-DMR-Analysis" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaamini/Documents/yaamini-virginica/analyses/2018-11-01-DML-and-DMR-Analysis\n" ] } ], "source": [ "cd 2018-11-01-DML-and-DMR-Analysis/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 0b. Download Genome Feature Files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will be using the following tracks:\n", "\n", "1. Exon: Coding regions\n", "2. Intron: Regions that are removed\n", "3. Genes: This includes exons and introns, as well as constituent mRNA.\n", "4. Transposable elements (all): Transposable elements located using information all species in the RepeatMasker databse (see [Sam's notes](http://onsnetwork.org/kubu4/2018/08/28/transposable-element-mapping-crassostrea-virginica-genome-cvirginica_v300-using-repeatmasker-4-07/) for more information)\n", "5. Tranpsosable elements (_C. gigas_): Transposable elements located using information from _C. gigas_ only (see [Sam's notes](http://onsnetwork.org/kubu4/2018/08/28/transposable-element-mapping-crassostrea-virginica-genome-cvirginica_v300-using-repeatmasker-4-07/) for more information)\n", "4. CG motifs: Regions with CGs where methylation can occur" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!curl https://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2019-05-13-Yaamini-Virginica-Repository/analyses/2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_exon_sorted_yrv.bed > C_virginica-3.0_Gnomon_exon_sorted_yrv.bed" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!curl https://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2019-05-13-Yaamini-Virginica-Repository/analyses/2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_intron_yrv.bed > C_virginica-3.0_Gnomon_intron_yrv.bed" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!curl https://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2019-05-13-Yaamini-Virginica-Repository/analyses/2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_gene_sorted_yrv.bed > C_virginica-3.0_Gnomon_gene_sorted_yrv.bed" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 63.0M 100 63.0M 0 0 45.2M 0 0:00:01 0:00:01 --:--:-- 45.3M\n" ] } ], "source": [ "!curl http://owl.fish.washington.edu/halfshell/genomic-databank/C_virginica-3.0_TE-all.gff > C_virginica-3.0_TE-all.gff" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 57.4M 100 57.4M 0 0 47.4M 0 0:00:01 0:00:01 --:--:-- 47.5M\n" ] } ], "source": [ "!curl http://owl.fish.washington.edu/halfshell/genomic-databank/C_virginica-3.0_TE-Cg.gff > C_virginica-3.0_TE-Cg.gff" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 26.4M 100 26.4M 0 0 63.4M 0 --:--:-- --:--:-- --:--:-- 64.0M\n" ] } ], "source": [ "!curl https://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2019-05-13-Yaamini-Virginica-Repository/analyses/2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_mRNA_yrv.gff3 > C_virginica-3.0_Gnomon_mRNA_yrv.gff3" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "C_virginica-3.0_CG-motif.bed\r\n", "C_virginica-3.0_CG-motif.bed.idx\r\n", "\u001b[31mC_virginica-3.0_Gnomon_exon_sorted_yrv.bed\u001b[m\u001b[m\r\n", "\u001b[31mC_virginica-3.0_Gnomon_gene_sorted_yrv.bed\u001b[m\u001b[m\r\n", "\u001b[31mC_virginica-3.0_Gnomon_intron_yrv.bed\u001b[m\u001b[m\r\n", "\u001b[31mC_virginica-3.0_Gnomon_mRNA_yrv.bed\u001b[m\u001b[m\r\n", "C_virginica-3.0_TE-Cg.gff\r\n", "C_virginica-3.0_TE-all.gff\r\n" ] } ], "source": [ "!ls C_virginica*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Locate Relevant Files and Set Variable Path Names" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1a. Set Variable Path Names" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setting the variable path names allows me to reuse this script with different input files or different paths to programs without manually changing the file names each time." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "bedtoolsDirectory = \"/Users/yaamini/bedtools2/bin/\"" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "DMLlist = \"../../analyses/2018-10-25-MethylKit/2019-04-05-DML-Destrand-5x-Locations.bed\"" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "hyperDML = \"../../analyses/2018-10-25-MethylKit/2019-04-05-DML-Destrand-5x-Locations-Hypermethylated.bed\"" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "hypoDML = \"../../analyses/2018-10-25-MethylKit/2019-04-05-DML-Destrand-5x-Locations-Hypomethylated.bed\"" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "DMLBackground = \"../2018-10-25-MethylKit/2019-05-14-Methylation-Information-Filtered-Destrand-Cov5.bed\"" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "DMRlist = \"../../analyses/2018-10-25-MethylKit/2019-06-05-DMR-Locations.bed\"" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "hyperDMR = \"../../analyses/2018-10-25-MethylKit/2019-06-05-DMR-Destrand-5x-Locations-Tiles100-Hypermethylated.bed\"" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "hypoDMR = \"../../analyses/2018-10-25-MethylKit/2019-06-05-DMR-Destrand-5x-Locations-Tiles100-Hypomethylated.bed\"" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "DMRBackground = \"../../analyses/2018-10-25-MethylKit/2019-06-05-Methylation-Information-Filtered-Destrand-Cov5-Tiles100.bed\"" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "exonList = \"C_virginica-3.0_Gnomon_exon_sorted_yrv.bed\"" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "intronList = \"C_virginica-3.0_Gnomon_intron_yrv.bed\"" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "geneList = \"../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_gene_sorted_yrv.gff3\"" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "transposableElementsAll = \"C_virginica-3.0_TE-all.gff\"" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "transposableElementsCg = \"C_virginica-3.0_TE-Cg.gff\"" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "CGMotifList = \"C_virginica-3.0_CG-motif.bed\"" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "mRNAList = \"C_virginica-3.0_Gnomon_mRNA_yrv.gff3\"" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "exonUTR = \"../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_exonUTR_yrv.gff3\"" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true }, "outputs": [], "source": [ "CDS = \"../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_CSD_sorted_yrv.bed\"" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": true }, "outputs": [], "source": [ "nonCDS = \"../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_noncoding_yrv.gff3\"" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": true }, "outputs": [], "source": [ "lncRNA = \"../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_lncRNA_yrv.gff3\"" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": true }, "outputs": [], "source": [ "intergenic = \"../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_intergenic_yrv.gff3\"" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "methylationIslands = \"../2019-03-18-Characterizing-CpG-Methylation/2020-02-06-Methylation-Islands-500_0.02_50-filtered.tab.bed\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1b. Confirm Variable Path Works and Characterize Files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The BEDfiles with DML and DMR can be viewed below. Columns are are the chromosome, start position, end position, strand, and fold difference with direction. The files only have DML and DMR that were at least 50% different between the two treatments (control and elevated pCO2)." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t571138\t571140\t58\r\n", "NC_035780.1\t1882691\t1882693\t64\r\n", "NC_035780.1\t1885022\t1885024\t61\r\n", "NC_035780.1\t1933499\t1933501\t51\r\n", "NC_035780.1\t1958998\t1959000\t50\r\n", "NC_035780.1\t2538924\t2538926\t-50\r\n", "NC_035780.1\t2541726\t2541728\t-54\r\n", "NC_035780.1\t2584492\t2584494\t56\r\n", "NC_035780.1\t2586508\t2586510\t-53\r\n", "NC_035780.1\t2588794\t2588796\t-53\r\n" ] } ], "source": [ "#Previewing the files\n", "!head {DMLlist}" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 598 ../../analyses/2018-10-25-MethylKit/2019-04-05-DML-Destrand-5x-Locations.bed\r\n" ] } ], "source": [ "#Counting the number of lines to count DML\n", "!wc -l {DMLlist}" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 310 ../../analyses/2018-10-25-MethylKit/2019-04-05-DML-Destrand-5x-Locations-Hypermethylated.bed\r\n" ] } ], "source": [ "!wc -l {hyperDML}" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t401630\t401632\t53\r\n", "NC_035780.1\t571138\t571140\t58\r\n", "NC_035780.1\t1882691\t1882693\t64\r\n", "NC_035780.1\t1885022\t1885024\t61\r\n", "NC_035780.1\t1933499\t1933501\t51\r\n", "NC_035780.1\t2584492\t2584494\t56\r\n", "NC_035780.1\t2589720\t2589722\t57\r\n", "NC_035780.1\t4286286\t4286288\t67\r\n", "NC_035780.1\t8833124\t8833126\t60\r\n", "NC_035780.1\t12631453\t12631455\t60\r\n" ] } ], "source": [ "!head {hyperDML}" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 288 ../../analyses/2018-10-25-MethylKit/2019-04-05-DML-Destrand-5x-Locations-Hypomethylated.bed\r\n" ] } ], "source": [ "!wc -l {hypoDML}" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t2538924\t2538926\t-50\r\n", "NC_035780.1\t2541726\t2541728\t-54\r\n", "NC_035780.1\t2586508\t2586510\t-53\r\n", "NC_035780.1\t4286802\t4286804\t-62\r\n", "NC_035780.1\t4288213\t4288215\t-58\r\n", "NC_035780.1\t4289628\t4289630\t-52\r\n", "NC_035780.1\t8693287\t8693289\t-52\r\n", "NC_035780.1\t9110274\t9110276\t-63\r\n", "NC_035780.1\t17093218\t17093220\t-52\r\n", "NC_035780.1\t17488958\t17488960\t-57\r\n" ] } ], "source": [ "!head {hypoDML}" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t571100\t571200\tDMR\t58\r\n", "NC_035780.1\t1885000\t1885100\tDMR\t50\r\n", "NC_035780.1\t1933500\t1933600\tDMR\t53\r\n", "NC_035780.1\t2538900\t2539000\tDMR\t-50\r\n", "NC_035780.1\t22276700\t22276800\tDMR\t56\r\n", "NC_035780.1\t28563400\t28563500\tDMR\t61\r\n", "NC_035780.1\t31302900\t31303000\tDMR\t-60\r\n", "NC_035780.1\t35969100\t35969200\tDMR\t-53\r\n", "NC_035780.1\t38236400\t38236500\tDMR\t50\r\n", "NC_035781.1\t5386400\t5386500\tDMR\t51\r\n" ] } ], "source": [ "!head {DMRlist}" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 71 ../../analyses/2018-10-25-MethylKit/2019-06-05-DMR-Locations.bed\r\n" ] } ], "source": [ "!wc -l {DMRlist}" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t571100\t571201\t58\r\n", "NC_035780.1\t1885000\t1885101\t50\r\n", "NC_035780.1\t1933500\t1933601\t53\r\n", "NC_035780.1\t22276700\t22276801\t56\r\n", "NC_035780.1\t28563400\t28563501\t61\r\n", "NC_035780.1\t38236400\t38236501\t50\r\n", "NC_035781.1\t5386400\t5386501\t51\r\n", "NC_035781.1\t24474500\t24474601\t53\r\n", "NC_035781.1\t43942600\t43942701\t52\r\n", "NC_035781.1\t45110100\t45110201\t71\r\n" ] } ], "source": [ "!head {hyperDMR}" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 37 ../../analyses/2018-10-25-MethylKit/2019-06-05-DMR-Destrand-5x-Locations-Tiles100-Hypermethylated.bed\r\n" ] } ], "source": [ "!wc -l {hyperDMR}" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t2538900\t2539001\t-50\r\n", "NC_035780.1\t31302900\t31303001\t-60\r\n", "NC_035780.1\t35969100\t35969201\t-53\r\n", "NC_035781.1\t7626500\t7626601\t-56\r\n", "NC_035781.1\t13281000\t13281101\t-57\r\n", "NC_035781.1\t20126000\t20126101\t-52\r\n", "NC_035781.1\t30789600\t30789701\t-57\r\n", "NC_035781.1\t43054100\t43054201\t-60\r\n", "NC_035781.1\t45110200\t45110301\t-51\r\n", "NC_035781.1\t59605700\t59605801\t-54\r\n" ] } ], "source": [ "!head {hypoDMR}" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 34 ../../analyses/2018-10-25-MethylKit/2019-06-05-DMR-Destrand-5x-Locations-Tiles100-Hypomethylated.bed\r\n" ] } ], "source": [ "!wc -l {hypoDMR}" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_007175.2\t101\t200\t*\r\n", "NC_007175.2\t601\t700\t*\r\n", "NC_007175.2\t1501\t1600\t*\r\n", "NC_007175.2\t2201\t2300\t*\r\n", "NC_007175.2\t3301\t3400\t*\r\n", "NC_007175.2\t4801\t4900\t*\r\n", "NC_007175.2\t5301\t5400\t*\r\n", "NC_007175.2\t5401\t5500\t*\r\n", "NC_007175.2\t5501\t5600\t*\r\n", "NC_007175.2\t6001\t6100\t*\r\n" ] } ], "source": [ "!head {DMRBackground}" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 152226 ../../analyses/2018-10-25-MethylKit/2019-06-05-Methylation-Information-Filtered-Destrand-Cov5-Tiles100.bed\r\n" ] } ], "source": [ "!wc -l {DMRBackground}" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t13578\t13603\r\n", "NC_035780.1\t14237\t14290\r\n", "NC_035780.1\t14557\t14594\r\n", "NC_035780.1\t28961\t29073\r\n", "NC_035780.1\t30524\t31557\r\n", "NC_035780.1\t31736\t31887\r\n", "NC_035780.1\t31977\t32565\r\n", "NC_035780.1\t32959\t33324\r\n", "NC_035780.1\t43111\t44358\r\n", "NC_035780.1\t43111\t44358\r\n" ] } ], "source": [ "!head {exonList}" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 731279 C_virginica-3.0_Gnomon_exon_sorted_yrv.bed\r\n" ] } ], "source": [ "!wc -l {exonList}" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t13603\t14236\r\n", "NC_035780.1\t14290\t14556\r\n", "NC_035780.1\t29073\t30523\r\n", "NC_035780.1\t31557\t31735\r\n", "NC_035780.1\t31887\t31976\r\n", "NC_035780.1\t32565\t32958\r\n", "NC_035780.1\t44358\t45912\r\n", "NC_035780.1\t46506\t64122\r\n", "NC_035780.1\t64334\t66868\r\n", "NC_035780.1\t85777\t88422\r\n" ] } ], "source": [ "!head {intronList}" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 316614 C_virginica-3.0_Gnomon_intron_yrv.bed\r\n" ] } ], "source": [ "!wc -l {intronList}" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t13578\t14594\r\n", "NC_035780.1\t28961\t33324\r\n", "NC_035780.1\t43111\t66897\r\n", "NC_035780.1\t85606\t95254\r\n", "NC_035780.1\t99840\t106460\r\n", "NC_035780.1\t108305\t110077\r\n", "NC_035780.1\t151859\t157536\r\n", "NC_035780.1\t163809\t183798\r\n", "NC_035780.1\t164820\t166793\r\n", "NC_035780.1\t169468\t170178\r\n" ] } ], "source": [ "!head {geneList}" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 38929 C_virginica-3.0_Gnomon_gene_sorted_yrv.bed\r\n" ] } ], "source": [ "!wc -l {geneList}" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "##gff-version 2\r\n", "##date 2018-08-23\r\n", "##sequence-region Cvirginica_v300.fa\r\n", "NC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n", "NC_007175.2\tRepeatMasker\tsimilarity\t1728\t1947\t26.1\t-\t.\tTarget \"Motif:REP-6_LMi\" 14320 14534\r\n", "NC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n", "NC_007175.2\tRepeatMasker\tsimilarity\t2129\t2367\t20.5\t-\t.\tTarget \"Motif:REP-6_LMi\" 13886 14118\r\n", "NC_007175.2\tRepeatMasker\tsimilarity\t2836\t2980\t31.5\t+\t.\tTarget \"Motif:REP-6_LMi\" 6216 6359\r\n", "NC_007175.2\tRepeatMasker\tsimilarity\t3196\t3277\t30.5\t+\t.\tTarget \"Motif:REP-6_LMi\" 6572 6653\r\n", "NC_007175.2\tRepeatMasker\tsimilarity\t5168\t5532\t32.9\t+\t.\tTarget \"Motif:REP-6_LMi\" 4620 4983\r\n" ] } ], "source": [ "!head {transposableElementsAll}" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 692371 C_virginica-3,0_TE-all.gff\r\n" ] } ], "source": [ "!wc -l {transposableElementsAll}" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "##gff-version 2\r\n", "##date 2018-08-27\r\n", "##sequence-region Cvirginica_v300.fa\r\n", "NC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n", "NC_007175.2\tRepeatMasker\tsimilarity\t6529\t6628\t19.0\t+\t.\tTarget \"Motif:(TA)n\" 2 102\r\n", "NC_035780.1\tRepeatMasker\tsimilarity\t1473\t1535\t 0.0\t+\t.\tTarget \"Motif:(TAACCC)n\" 1 63\r\n", "NC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n", "NC_035780.1\tRepeatMasker\tsimilarity\t7423\t7489\t25.4\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2097 2163\r\n", "NC_035780.1\tRepeatMasker\tsimilarity\t7623\t8079\t34.1\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 1516 1975\r\n", "NC_035780.1\tRepeatMasker\tsimilarity\t8261\t8295\t14.1\t+\t.\tTarget \"Motif:(CTCCT)n\" 1 33\r\n" ] } ], "source": [ "!head {transposableElementsCg}" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 626665 C_virginica-3.0_TE-Cg.gff\r\n" ] } ], "source": [ "!wc -l {transposableElementsCg}" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t28\t30\tCG_motif\r\n", "NC_035780.1\t54\t56\tCG_motif\r\n", "NC_035780.1\t75\t77\tCG_motif\r\n", "NC_035780.1\t93\t95\tCG_motif\r\n", "NC_035780.1\t103\t105\tCG_motif\r\n", "NC_035780.1\t116\t118\tCG_motif\r\n", "NC_035780.1\t134\t136\tCG_motif\r\n", "NC_035780.1\t159\t161\tCG_motif\r\n", "NC_035780.1\t209\t211\tCG_motif\r\n", "NC_035780.1\t224\t226\tCG_motif\r\n" ] } ], "source": [ "!head {CGMotifList}" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 14458703 C_virginica-3.0_CG-motif.bed\r\n" ] } ], "source": [ "!wc -l {CGMotifList}" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n", "NC_035780.1\tGnomon\tmRNA\t43111\t66897\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n", "NC_035780.1\tGnomon\tmRNA\t43111\t46506\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n", "NC_035780.1\tGnomon\tmRNA\t85606\t95254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n", "NC_035780.1\tGnomon\tmRNA\t99840\t106460\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n", "NC_035780.1\tGnomon\tmRNA\t108305\t110077\t.\t-\t.\tID=rna6;Parent=gene5;Dbxref=GeneID:111128944,Genbank:XM_022474921.1;Name=XM_022474921.1;gbkey=mRNA;gene=LOC111128944;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 93%25 coverage of the annotated genomic feature by RNAseq alignments;partial=true;product=mucin-19-like;start_range=.,108305;transcript_id=XM_022474921.1\r\n", "NC_035780.1\tGnomon\tmRNA\t151859\t157536\t.\t+\t.\tID=rna7;Parent=gene6;Dbxref=GeneID:111128953,Genbank:XM_022474931.1;Name=XM_022474931.1;gbkey=mRNA;gene=LOC111128953;model_evidence=Supporting evidence includes similarity to: 1 Protein;product=GATA zinc finger domain-containing protein 14-like;transcript_id=XM_022474931.1\r\n", "NC_035780.1\tGnomon\tmRNA\t163809\t183798\t.\t-\t.\tID=rna8;Parent=gene7;Dbxref=GeneID:111105691,Genbank:XM_022440054.1;Name=XM_022440054.1;gbkey=mRNA;gene=LOC111105691;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 9 samples with support for all annotated introns;product=uncharacterized LOC111105691;transcript_id=XM_022440054.1\r\n", "NC_035780.1\tGnomon\tmRNA\t164820\t166793\t.\t+\t.\tID=rna9;Parent=gene8;Dbxref=GeneID:111105685,Genbank:XM_022440042.1;Name=XM_022440042.1;gbkey=mRNA;gene=LOC111105685;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=protein ANTAGONIST OF LIKE HETEROCHROMATIN PROTEIN 1-like;transcript_id=XM_022440042.1\r\n", "NC_035780.1\tGnomon\tmRNA\t190449\t193594\t.\t-\t.\tID=rna11;Parent=gene10;Dbxref=GeneID:111133554,Genbank:XM_022482070.1;Name=XM_022482070.1;gbkey=mRNA;gene=LOC111133554;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=putative uncharacterized protein DDB_G0277407;transcript_id=XM_022482070.1\r\n" ] } ], "source": [ "!head {mRNAList}" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 60201 C_virginica-3.0_Gnomon_mRNA_yrv.gff3\r\n" ] } ], "source": [ "!wc -l {mRNAList}" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\tGnomon\texon\t13578\t13603\t.\t+\t.\tID=id1;Parent=rna0;Dbxref=GeneID:111116054,Genbank:XR_002636969.1;gbkey=ncRNA;gene=LOC111116054;product=uncharacterized LOC111116054;transcript_id=XR_002636969.1\r\n", "NC_035780.1\tGnomon\texon\t14237\t14290\t.\t+\t.\tID=id2;Parent=rna0;Dbxref=GeneID:111116054,Genbank:XR_002636969.1;gbkey=ncRNA;gene=LOC111116054;product=uncharacterized LOC111116054;transcript_id=XR_002636969.1\r\n", "NC_035780.1\tGnomon\texon\t14557\t14594\t.\t+\t.\tID=id3;Parent=rna0;Dbxref=GeneID:111116054,Genbank:XR_002636969.1;gbkey=ncRNA;gene=LOC111116054;product=uncharacterized LOC111116054;transcript_id=XR_002636969.1\r\n", "NC_035780.1\tGnomon\texon\t28961\t29073\t.\t+\t.\tID=id4;Parent=rna1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;gbkey=mRNA;gene=LOC111126949;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n", "NC_035780.1\tGnomon\texon\t30524\t30534\t.\t+\t.\tID=id5;Parent=rna1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;gbkey=mRNA;gene=LOC111126949;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n", "NC_035780.1\tGnomon\texon\t33205\t33324\t.\t+\t.\tID=id8;Parent=rna1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;gbkey=mRNA;gene=LOC111126949;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n", "NC_035780.1\tGnomon\texon\t43111\t43261\t.\t-\t.\tID=id11;Parent=rna2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;gbkey=mRNA;gene=LOC111110729;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n", "NC_035780.1\tGnomon\texon\t43111\t43261\t.\t-\t.\tID=id13;Parent=rna3;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;gbkey=mRNA;gene=LOC111110729;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n", "NC_035780.1\tGnomon\texon\t45998\t46506\t.\t-\t.\tID=id12;Parent=rna3;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;gbkey=mRNA;gene=LOC111110729;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n", "NC_035780.1\tGnomon\texon\t64220\t64334\t.\t-\t.\tID=id10;Parent=rna2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;gbkey=mRNA;gene=LOC111110729;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n" ] } ], "source": [ "!head {exonUTR}" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 182752 ../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_exonUTR_yrv.gff3\r\n" ] } ], "source": [ "!wc -l {exonUTR}" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t30535\t31557\r\n", "NC_035780.1\t31736\t31887\r\n", "NC_035780.1\t31977\t32565\r\n", "NC_035780.1\t32959\t33204\r\n", "NC_035780.1\t43262\t44358\r\n", "NC_035780.1\t43262\t44358\r\n", "NC_035780.1\t45913\t45997\r\n", "NC_035780.1\t64123\t64219\r\n", "NC_035780.1\t85616\t85777\r\n", "NC_035780.1\t88423\t88589\r\n" ] } ], "source": [ "!head {CDS}" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 645355 ../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_CSD_sorted_yrv.bed\r\n" ] } ], "source": [ "!wc -l {CDS}" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t0\t13577\r\n", "NC_035780.1\t13603\t14236\r\n", "NC_035780.1\t14290\t14556\r\n", "NC_035780.1\t14594\t28960\r\n", "NC_035780.1\t29073\t30523\r\n", "NC_035780.1\t31557\t31735\r\n", "NC_035780.1\t31887\t31976\r\n", "NC_035780.1\t32565\t32958\r\n", "NC_035780.1\t33324\t43110\r\n", "NC_035780.1\t44358\t45912\r\n" ] } ], "source": [ "!head {nonCDS}" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 336677 ../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_noncoding_yrv.gff3\r\n" ] } ], "source": [ "!wc -l {nonCDS}" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\tGnomon\tlnc_RNA\t13578\t14594\t.\t+\t.\tID=rna0;Parent=gene0;Dbxref=GeneID:111116054,Genbank:XR_002636969.1;Name=XR_002636969.1;gbkey=ncRNA;gene=LOC111116054;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=uncharacterized LOC111116054;transcript_id=XR_002636969.1\n", "NC_035780.1\tGnomon\tlnc_RNA\t169468\t170178\t.\t-\t.\tID=rna10;Parent=gene9;Dbxref=GeneID:111105702,Genbank:XR_002635081.1;Name=XR_002635081.1;gbkey=ncRNA;gene=LOC111105702;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=uncharacterized LOC111105702;transcript_id=XR_002635081.1\n", "NC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\n", "NC_035780.1\tGnomon\tlnc_RNA\t1280831\t1282416\t.\t-\t.\tID=rna130;Parent=gene71;Dbxref=GeneID:111124195,Genbank:XR_002638148.1;Name=XR_002638148.1;gbkey=ncRNA;gene=LOC111124195;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=uncharacterized LOC111124195;transcript_id=XR_002638148.1\n", "NC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\n", "NC_035780.1\tGnomon\tlnc_RNA\t1503802\t1513830\t.\t-\t.\tID=rna137;Parent=gene78;Dbxref=GeneID:111114441,Genbank:XR_002636574.1;Name=XR_002636574.1;gbkey=ncRNA;gene=LOC111114441;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 2 samples with support for all annotated introns;product=uncharacterized LOC111114441;transcript_id=XR_002636574.1\n", "NC_035780.1\tGnomon\tlnc_RNA\t1856841\t1863697\t.\t-\t.\tID=rna151;Parent=gene92;Dbxref=GeneID:111115591,Genbank:XR_002636863.1;Name=XR_002636863.1;gbkey=ncRNA;gene=LOC111115591;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=uncharacterized LOC111115591%2C transcript variant X1;transcript_id=XR_002636863.1\n", "NC_035780.1\tGnomon\tlnc_RNA\t1856841\t1863683\t.\t-\t.\tID=rna152;Parent=gene92;Dbxref=GeneID:111115591,Genbank:XR_002636864.1;Name=XR_002636864.1;gbkey=ncRNA;gene=LOC111115591;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=uncharacterized LOC111115591%2C transcript variant X2;transcript_id=XR_002636864.1\n", "NC_035780.1\tGnomon\tlnc_RNA\t2161223\t2166803\t.\t+\t.\tID=rna188;Parent=gene111;Dbxref=GeneID:111109763,Genbank:XR_002635698.1;Name=XR_002635698.1;gbkey=ncRNA;gene=LOC111109763;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 23 samples with support for all annotated introns;product=uncharacterized LOC111109763;transcript_id=XR_002635698.1\n", "NC_035780.1\tGnomon\tlnc_RNA\t2928484\t2930094\t.\t-\t.\tID=rna249;Parent=gene150;Dbxref=GeneID:111122009,Genbank:XR_002637875.1;Name=XR_002637875.1;gbkey=ncRNA;gene=LOC111122009;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 5 samples with support for all annotated introns;product=uncharacterized LOC111122009;transcript_id=XR_002637875.1\n" ] } ], "source": [ "!head {lncRNA}" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 4750 ../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_lncRNA_yrv.gff3\r\n" ] } ], "source": [ "!wc -l {lncRNA}" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t0\t13577\r\n", "NC_035780.1\t14594\t28960\r\n", "NC_035780.1\t33324\t43110\r\n", "NC_035780.1\t66897\t85605\r\n", "NC_035780.1\t95254\t99839\r\n", "NC_035780.1\t106460\t108304\r\n", "NC_035780.1\t110077\t151858\r\n", "NC_035780.1\t157536\t163808\r\n", "NC_035780.1\t183798\t190448\r\n", "NC_035780.1\t193594\t204242\r\n" ] } ], "source": [ "!head {intergenic}" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 34557 ../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_intergenic_yrv.gff3\r\n" ] } ], "source": [ "!wc -l {intergenic}" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t100558\t101923\r\n", "NC_035780.1\t102593\t103702\r\n", "NC_035780.1\t245717\t248838\r\n", "NC_035780.1\t250197\t351003\r\n", "NC_035780.1\t353355\t356963\r\n", "NC_035780.1\t369554\t378352\r\n", "NC_035780.1\t380654\t423774\r\n", "NC_035780.1\t449440\t450158\r\n", "NC_035780.1\t471401\t472285\r\n", "NC_035780.1\t529221\t530454\r\n" ] } ], "source": [ "!head {methylationIslands}" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 37063 ../2019-03-18-Characterizing-CpG-Methylation/2020-02-06-Methylation-Islands-500_0.02_50-filtered.tab.bed\r\n" ] } ], "source": [ "!wc -l {methylationIslands}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Identify DML and DMR Overlaps with Genomic Feature Tracks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To identify the location of DML and DMR in the *C. virginica* genome, I will use `intersect` from `bedtools`. [The BEDtools suite](http://bedtools.readthedocs.io/en/latest/content/bedtools-suite.html) allows me to easily find overlapping regions of different bed files." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r\n", "Tool: bedtools intersect (aka intersectBed)\r\n", "Version: v2.29.1\r\n", "Summary: Report overlaps between two feature files.\r\n", "\r\n", "Usage: bedtools intersect [OPTIONS] -a -b \r\n", "\r\n", "\tNote: -b may be followed with multiple databases and/or \r\n", "\twildcard (*) character(s). \r\n", "Options: \r\n", "\t-wa\tWrite the original entry in A for each overlap.\r\n", "\r\n", "\t-wb\tWrite the original entry in B for each overlap.\r\n", "\t\t- Useful for knowing _what_ A overlaps. Restricted by -f and -r.\r\n", "\r\n", "\t-loj\tPerform a \"left outer join\". That is, for each feature in A\r\n", "\t\treport each overlap with B. If no overlaps are found, \r\n", "\t\treport a NULL feature for B.\r\n", "\r\n", "\t-wo\tWrite the original A and B entries plus the number of base\r\n", "\t\tpairs of overlap between the two features.\r\n", "\t\t- Overlaps restricted by -f and -r.\r\n", "\t\t Only A features with overlap are reported.\r\n", "\r\n", "\t-wao\tWrite the original A and B entries plus the number of base\r\n", "\t\tpairs of overlap between the two features.\r\n", "\t\t- Overlapping features restricted by -f and -r.\r\n", "\t\t However, A features w/o overlap are also reported\r\n", "\t\t with a NULL B feature and overlap = 0.\r\n", "\r\n", "\t-u\tWrite the original A entry _once_ if _any_ overlaps found in B.\r\n", "\t\t- In other words, just report the fact >=1 hit was found.\r\n", "\t\t- Overlaps restricted by -f and -r.\r\n", "\r\n", "\t-c\tFor each entry in A, report the number of overlaps with B.\r\n", "\t\t- Reports 0 for A entries that have no overlap with B.\r\n", "\t\t- Overlaps restricted by -f, -F, -r, and -s.\r\n", "\r\n", "\t-C\tFor each entry in A, separately report the number of\r\n", "\t\t- overlaps with each B file on a distinct line.\r\n", "\t\t- Reports 0 for A entries that have no overlap with B.\r\n", "\t\t- Overlaps restricted by -f, -F, -r, and -s.\r\n", "\r\n", "\t-v\tOnly report those entries in A that have _no overlaps_ with B.\r\n", "\t\t- Similar to \"grep -v\" (an homage).\r\n", "\r\n", "\t-ubam\tWrite uncompressed BAM output. Default writes compressed BAM.\r\n", "\r\n", "\t-s\tRequire same strandedness. That is, only report hits in B\r\n", "\t\tthat overlap A on the _same_ strand.\r\n", "\t\t- By default, overlaps are reported without respect to strand.\r\n", "\r\n", "\t-S\tRequire different strandedness. That is, only report hits in B\r\n", "\t\tthat overlap A on the _opposite_ strand.\r\n", "\t\t- By default, overlaps are reported without respect to strand.\r\n", "\r\n", "\t-f\tMinimum overlap required as a fraction of A.\r\n", "\t\t- Default is 1E-9 (i.e., 1bp).\r\n", "\t\t- FLOAT (e.g. 0.50)\r\n", "\r\n", "\t-F\tMinimum overlap required as a fraction of B.\r\n", "\t\t- Default is 1E-9 (i.e., 1bp).\r\n", "\t\t- FLOAT (e.g. 0.50)\r\n", "\r\n", "\t-r\tRequire that the fraction overlap be reciprocal for A AND B.\r\n", "\t\t- In other words, if -f is 0.90 and -r is used, this requires\r\n", "\t\t that B overlap 90% of A and A _also_ overlaps 90% of B.\r\n", "\r\n", "\t-e\tRequire that the minimum fraction be satisfied for A OR B.\r\n", "\t\t- In other words, if -e is used with -f 0.90 and -F 0.10 this requires\r\n", "\t\t that either 90% of A is covered OR 10% of B is covered.\r\n", "\t\t Without -e, both fractions would have to be satisfied.\r\n", "\r\n", "\t-split\tTreat \"split\" BAM or BED12 entries as distinct BED intervals.\r\n", "\r\n", "\t-g\tProvide a genome file to enforce consistent chromosome sort order\r\n", "\t\tacross input files. Only applies when used with -sorted option.\r\n", "\r\n", "\t-nonamecheck\tFor sorted data, don't throw an error if the file has different naming conventions\r\n", "\t\t\tfor the same chromosome. ex. \"chr1\" vs \"chr01\".\r\n", "\r\n", "\t-sorted\tUse the \"chromsweep\" algorithm for sorted (-k1,1 -k2,2n) input.\r\n", "\r\n", "\t-names\tWhen using multiple databases, provide an alias for each that\r\n", "\t\twill appear instead of a fileId when also printing the DB record.\r\n", "\r\n", "\t-filenames\tWhen using multiple databases, show each complete filename\r\n", "\t\t\tinstead of a fileId when also printing the DB record.\r\n", "\r\n", "\t-sortout\tWhen using multiple databases, sort the output DB hits\r\n", "\t\t\tfor each record.\r\n", "\r\n", "\t-bed\tIf using BAM input, write output as BED.\r\n", "\r\n", "\t-header\tPrint the header from the A file prior to results.\r\n", "\r\n", "\t-nobuf\tDisable buffered output. Using this option will cause each line\r\n", "\t\tof output to be printed as it is generated, rather than saved\r\n", "\t\tin a buffer. This will make printing large output files \r\n", "\t\tnoticeably slower, but can be useful in conjunction with\r\n", "\t\tother software tools and scripts that need to process one\r\n", "\t\tline of bedtools output at a time.\r\n", "\r\n", "\t-iobuf\tSpecify amount of memory to use for input buffer.\r\n", "\t\tTakes an integer argument. Optional suffixes K/M/G supported.\r\n", "\t\tNote: currently has no effect with compressed files.\r\n", "\r\n", "Notes: \r\n", "\t(1) When a BAM file is used for the A file, the alignment is retained if overlaps exist,\r\n", "\tand excluded if an overlap cannot be found. If multiple overlaps exist, they are not\r\n", "\treported, as we are only testing for one or more overlaps.\r\n", "\r\n", "\r\n", "\r\n", "\r\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2a. Exons" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### All DML" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 368\n", "DML overlaps with exons\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMLlist} \\\n", "-b {exonList} \\\n", "| wc -l\n", "!echo \"DML overlaps with exons\"" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMLlist} \\\n", "-b {exonList} \\\n", "> 2019-05-29-DML-Exon.txt" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t571138\t571140\t58\tNC_035780.1\t570942\t571194\r\n", "NC_035780.1\t2538924\t2538926\t-50\tNC_035780.1\t2538624\t2538955\r\n", "NC_035780.1\t2586508\t2586510\t-53\tNC_035780.1\t2586438\t2586557\r\n", "NC_035780.1\t2589720\t2589722\t57\tNC_035780.1\t2589716\t2589955\r\n", "NC_035780.1\t4286286\t4286288\t67\tNC_035780.1\t4286174\t4286407\r\n", "NC_035780.1\t4286802\t4286804\t-62\tNC_035780.1\t4286783\t4286927\r\n", "NC_035780.1\t4289628\t4289630\t-52\tNC_035780.1\t4288592\t4290756\r\n", "NC_035780.1\t8693287\t8693289\t-52\tNC_035780.1\t8692509\t8693320\r\n", "NC_035780.1\t9110274\t9110276\t-63\tNC_035780.1\t9109982\t9111843\r\n", "NC_035780.1\t12631453\t12631455\t60\tNC_035780.1\t12630576\t12631487\r\n" ] } ], "source": [ "!head 2019-05-29-DML-Exon.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypermethylated DML" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 190\n", "hypermethylated DML overlaps with exons\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hyperDML} \\\n", "-b {exonList} \\\n", "| wc -l\n", "!echo \"hypermethylated DML overlaps with exons\"" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hyperDML} \\\n", "-b {exonList} \\\n", "> 2019-05-29-Hypermethylated-DML-Exon.txt" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t571138\t571140\t58\tNC_035780.1\t570942\t571194\r\n", "NC_035780.1\t2589720\t2589722\t57\tNC_035780.1\t2589716\t2589955\r\n", "NC_035780.1\t4286286\t4286288\t67\tNC_035780.1\t4286174\t4286407\r\n", "NC_035780.1\t12631453\t12631455\t60\tNC_035780.1\t12630576\t12631487\r\n", "NC_035780.1\t12631453\t12631455\t60\tNC_035780.1\t12630576\t12631487\r\n", "NC_035780.1\t12631453\t12631455\t60\tNC_035780.1\t12630577\t12631487\r\n", "NC_035780.1\t12631453\t12631455\t60\tNC_035780.1\t12630577\t12631487\r\n", "NC_035780.1\t15412264\t15412266\t50\tNC_035780.1\t15412219\t15412410\r\n", "NC_035780.1\t15412264\t15412266\t50\tNC_035780.1\t15412219\t15412410\r\n", "NC_035780.1\t15414935\t15414936\t51\tNC_035780.1\t15414935\t15415225\r\n" ] } ], "source": [ "!head 2019-05-29-Hypermethylated-DML-Exon.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypomethylated DML" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 178\n", "hypomethylated DML overlaps with exons\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hypoDML} \\\n", "-b {exonList} \\\n", "| wc -l\n", "!echo \"hypomethylated DML overlaps with exons\"" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hypoDML} \\\n", "-b {exonList} \\\n", "> 2019-05-29-Hypomethylated-DML-Exon.txt" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t2538924\t2538926\t-50\tNC_035780.1\t2538624\t2538955\r\n", "NC_035780.1\t2586508\t2586510\t-53\tNC_035780.1\t2586438\t2586557\r\n", "NC_035780.1\t4286802\t4286804\t-62\tNC_035780.1\t4286783\t4286927\r\n", "NC_035780.1\t4289628\t4289630\t-52\tNC_035780.1\t4288592\t4290756\r\n", "NC_035780.1\t8693287\t8693289\t-52\tNC_035780.1\t8692509\t8693320\r\n", "NC_035780.1\t9110274\t9110276\t-63\tNC_035780.1\t9109982\t9111843\r\n", "NC_035780.1\t17093218\t17093220\t-52\tNC_035780.1\t17092983\t17093548\r\n", "NC_035780.1\t19149580\t19149582\t-61\tNC_035780.1\t19149513\t19149749\r\n", "NC_035780.1\t19149580\t19149582\t-61\tNC_035780.1\t19149513\t19149749\r\n", "NC_035780.1\t19149580\t19149582\t-61\tNC_035780.1\t19149513\t19150486\r\n" ] } ], "source": [ "!head 2019-05-29-Hypomethylated-DML-Exon.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### All DMR" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 38\n", "DMR overlaps with exons\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMRlist} \\\n", "-b {exonList} \\\n", "| wc -l\n", "!echo \"DMR overlaps with exons\"" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMRlist} \\\n", "-b {exonList} \\\n", "> 2019-06-05-DMR-Exon.txt" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t571100\t571194\tDMR\t58\tNC_035780.1\t570942\t571194\r\n", "NC_035780.1\t1933574\t1933600\tDMR\t53\tNC_035780.1\t1933574\t1933615\r\n", "NC_035780.1\t2538900\t2538955\tDMR\t-50\tNC_035780.1\t2538624\t2538955\r\n", "NC_035780.1\t22276700\t22276800\tDMR\t56\tNC_035780.1\t22275427\t22278631\r\n", "NC_035780.1\t22276700\t22276800\tDMR\t56\tNC_035780.1\t22275427\t22278631\r\n", "NC_035781.1\t5386400\t5386493\tDMR\t51\tNC_035781.1\t5386310\t5386493\r\n", "NC_035781.1\t5386400\t5386493\tDMR\t51\tNC_035781.1\t5386310\t5386493\r\n", "NC_035781.1\t7626500\t7626546\tDMR\t-56\tNC_035781.1\t7626417\t7626546\r\n", "NC_035781.1\t13281000\t13281010\tDMR\t-57\tNC_035781.1\t13280898\t13281010\r\n", "NC_035781.1\t20126000\t20126100\tDMR\t-52\tNC_035781.1\t20125936\t20126403\r\n" ] } ], "source": [ "!head 2019-06-05-DMR-Exon.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypermethylated DMR" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 19\n", "hyper DMR overlaps with exons\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hyperDMR} \\\n", "-b {exonList} \\\n", "| wc -l\n", "!echo \"hyper DMR overlaps with exons\"" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hyperDMR} \\\n", "-b {exonList} \\\n", "> 2019-06-05-HyperDMR-Exon.txt" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t571100\t571194\t58\tNC_035780.1\t570942\t571194\r\n", "NC_035780.1\t1933574\t1933601\t53\tNC_035780.1\t1933574\t1933615\r\n", "NC_035780.1\t22276700\t22276801\t56\tNC_035780.1\t22275427\t22278631\r\n", "NC_035780.1\t22276700\t22276801\t56\tNC_035780.1\t22275427\t22278631\r\n", "NC_035781.1\t5386400\t5386493\t51\tNC_035781.1\t5386310\t5386493\r\n", "NC_035781.1\t5386400\t5386493\t51\tNC_035781.1\t5386310\t5386493\r\n", "NC_035781.1\t24474587\t24474601\t53\tNC_035781.1\t24474587\t24474788\r\n", "NC_035781.1\t45110100\t45110157\t71\tNC_035781.1\t45109859\t45110157\r\n", "NC_035783.1\t19526000\t19526101\t50\tNC_035783.1\t19525785\t19526784\r\n", "NC_035783.1\t19526000\t19526101\t50\tNC_035783.1\t19525785\t19526784\r\n" ] } ], "source": [ "!head 2019-06-05-HyperDMR-Exon.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypomethylated DMR" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 19\n", "hypo DMR overlaps with exons\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hypoDMR} \\\n", "-b {exonList} \\\n", "| wc -l\n", "!echo \"hypo DMR overlaps with exons\"" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hypoDMR} \\\n", "-b {exonList} \\\n", "> 2019-06-05-HypoDMR-Exon.txt" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t2538900\t2538955\t-50\tNC_035780.1\t2538624\t2538955\r\n", "NC_035781.1\t7626500\t7626546\t-56\tNC_035781.1\t7626417\t7626546\r\n", "NC_035781.1\t13281000\t13281010\t-57\tNC_035781.1\t13280898\t13281010\r\n", "NC_035781.1\t20126000\t20126101\t-52\tNC_035781.1\t20125936\t20126403\r\n", "NC_035781.1\t30789600\t30789701\t-57\tNC_035781.1\t30789514\t30790310\r\n", "NC_035781.1\t43054149\t43054201\t-60\tNC_035781.1\t43054149\t43054333\r\n", "NC_035783.1\t38039050\t38039101\t-51\tNC_035783.1\t38039050\t38039211\r\n", "NC_035783.1\t38039050\t38039101\t-51\tNC_035783.1\t38039050\t38039211\r\n", "NC_035783.1\t38039050\t38039101\t-51\tNC_035783.1\t38039050\t38039211\r\n", "NC_035783.1\t58980000\t58980101\t-61\tNC_035783.1\t58979191\t58980564\r\n" ] } ], "source": [ "!head 2019-06-05-HypoDMR-Exon.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DMR Background" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 92552\n", "DMR background overlaps with exons\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMRBackground} \\\n", "-b {exonList} \\\n", "| wc -l\n", "!echo \"DMR background overlaps with exons\"" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMRBackground} \\\n", "-b {exonList} \\\n", "> 2019-06-05-DMRBackground-Exon.txt" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t100554\t100600\t*\tNC_035780.1\t100554\t100661\r\n", "NC_035780.1\t100601\t100661\t*\tNC_035780.1\t100554\t100661\r\n", "NC_035780.1\t250301\t250400\t*\tNC_035780.1\t250285\t250608\r\n", "NC_035780.1\t250401\t250500\t*\tNC_035780.1\t250285\t250608\r\n", "NC_035780.1\t250501\t250600\t*\tNC_035780.1\t250285\t250608\r\n", "NC_035780.1\t250601\t250608\t*\tNC_035780.1\t250285\t250608\r\n", "NC_035780.1\t258108\t258200\t*\tNC_035780.1\t258108\t259494\r\n", "NC_035780.1\t258201\t258300\t*\tNC_035780.1\t258108\t259494\r\n", "NC_035780.1\t258301\t258400\t*\tNC_035780.1\t258108\t259494\r\n", "NC_035780.1\t258901\t259000\t*\tNC_035780.1\t258108\t259494\r\n" ] } ], "source": [ "!head 2019-06-05-DMRBackground-Exon.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2b. Introns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DML" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 192\n", "DML overlaps with introns\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMLlist} \\\n", "-b {intronList} \\\n", "| wc -l\n", "!echo \"DML overlaps with introns\"" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMLlist} \\\n", "-b {intronList} \\\n", "> 2019-06-05-DML-Intron.txt" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t401630\t401632\t53\tNC_035780.1\t401604\t401800\r\n", "NC_035780.1\t1882691\t1882693\t64\tNC_035780.1\t1882355\t1882971\r\n", "NC_035780.1\t1885022\t1885024\t61\tNC_035780.1\t1884754\t1886042\r\n", "NC_035780.1\t1933499\t1933501\t51\tNC_035780.1\t1932876\t1933573\r\n", "NC_035780.1\t2541726\t2541728\t-54\tNC_035780.1\t2538955\t2541768\r\n", "NC_035780.1\t2584492\t2584494\t56\tNC_035780.1\t2584153\t2584504\r\n", "NC_035780.1\t4288213\t4288215\t-58\tNC_035780.1\t4288128\t4288230\r\n", "NC_035780.1\t8833124\t8833126\t60\tNC_035780.1\t8832171\t8833699\r\n", "NC_035780.1\t17488958\t17488960\t-57\tNC_035780.1\t17488942\t17489178\r\n", "NC_035780.1\t22177828\t22177830\t-51\tNC_035780.1\t22154686\t22178240\r\n" ] } ], "source": [ "!head 2019-06-05-DML-Intron.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypermethylated DML" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 99\n", "hypermethylated DML overlaps with introns\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hyperDML} \\\n", "-b {intronList} \\\n", "| wc -l\n", "!echo \"hypermethylated DML overlaps with introns\"" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hyperDML} \\\n", "-b {intronList} \\\n", "> 2019-06-05-Hypermethylated-DML-Intron.txt" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t401630\t401632\t53\tNC_035780.1\t401604\t401800\r\n", "NC_035780.1\t1882691\t1882693\t64\tNC_035780.1\t1882355\t1882971\r\n", "NC_035780.1\t1885022\t1885024\t61\tNC_035780.1\t1884754\t1886042\r\n", "NC_035780.1\t1933499\t1933501\t51\tNC_035780.1\t1932876\t1933573\r\n", "NC_035780.1\t2584492\t2584494\t56\tNC_035780.1\t2584153\t2584504\r\n", "NC_035780.1\t8833124\t8833126\t60\tNC_035780.1\t8832171\t8833699\r\n", "NC_035780.1\t27396182\t27396184\t52\tNC_035780.1\t27396140\t27396706\r\n", "NC_035780.1\t32766797\t32766799\t58\tNC_035780.1\t32766346\t32769863\r\n", "NC_035780.1\t32766797\t32766799\t58\tNC_035780.1\t32766346\t32769863\r\n", "NC_035780.1\t38236493\t38236495\t50\tNC_035780.1\t38236121\t38236506\r\n" ] } ], "source": [ "!head 2019-05-29-Hypermethylated-DML-Intron.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypomethylated DML" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 93\n", "hypomethylated DML overlaps with introns\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hypoDML} \\\n", "-b {intronList} \\\n", "| wc -l\n", "!echo \"hypomethylated DML overlaps with introns\"" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hypoDML} \\\n", "-b {intronList} \\\n", "> 2019-05-29-Hypomethylated-DML-Intron.txt" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t2541726\t2541728\t-54\tNC_035780.1\t2538955\t2541768\r\n", "NC_035780.1\t4288213\t4288215\t-58\tNC_035780.1\t4288128\t4288230\r\n", "NC_035780.1\t17488958\t17488960\t-57\tNC_035780.1\t17488942\t17489178\r\n", "NC_035780.1\t22177828\t22177830\t-51\tNC_035780.1\t22154686\t22178240\r\n", "NC_035780.1\t22177828\t22177830\t-51\tNC_035780.1\t22154686\t22178240\r\n", "NC_035780.1\t25858297\t25858299\t-51\tNC_035780.1\t25858281\t25863048\r\n", "NC_035780.1\t31302904\t31302906\t-60\tNC_035780.1\t31302841\t31303151\r\n", "NC_035780.1\t31302934\t31302936\t-58\tNC_035780.1\t31302841\t31303151\r\n", "NC_035780.1\t32717030\t32717032\t-52\tNC_035780.1\t32716795\t32717179\r\n", "NC_035780.1\t35969128\t35969130\t-53\tNC_035780.1\t35969070\t35986498\r\n" ] } ], "source": [ "!head 2019-05-29-Hypomethylated-DML-Intron.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DMR" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 51\n", "DMR overlaps with introns\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMRlist} \\\n", "-b {intronList} \\\n", "| wc -l\n", "!echo \"DMR overlaps with introns\"" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMRlist} \\\n", "-b {intronList} \\\n", "> 2019-06-05-DMR-Intron.txt" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t571194\t571200\tDMR\t58\tNC_035780.1\t571194\t572676\r\n", "NC_035780.1\t1885000\t1885100\tDMR\t50\tNC_035780.1\t1884754\t1886042\r\n", "NC_035780.1\t1933500\t1933573\tDMR\t53\tNC_035780.1\t1932876\t1933573\r\n", "NC_035780.1\t2538955\t2539000\tDMR\t-50\tNC_035780.1\t2538955\t2541768\r\n", "NC_035780.1\t28563400\t28563500\tDMR\t61\tNC_035780.1\t28563399\t28564615\r\n", "NC_035780.1\t31302900\t31303000\tDMR\t-60\tNC_035780.1\t31302841\t31303151\r\n", "NC_035780.1\t35969100\t35969200\tDMR\t-53\tNC_035780.1\t35969070\t35986498\r\n", "NC_035780.1\t38236400\t38236500\tDMR\t50\tNC_035780.1\t38236121\t38236506\r\n", "NC_035781.1\t5386493\t5386500\tDMR\t51\tNC_035781.1\t5386493\t5386634\r\n", "NC_035781.1\t7626546\t7626600\tDMR\t-56\tNC_035781.1\t7626546\t7626816\r\n" ] } ], "source": [ "!head 2019-06-05-DMR-Intron.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypermethylated DMR" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 27\n", "hyperDMR overlaps with introns\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hyperDMR} \\\n", "-b {intronList} \\\n", "| wc -l\n", "!echo \"hyperDMR overlaps with introns\"" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hyperDMR} \\\n", "-b {intronList} \\\n", "> 2019-06-05-HyperDMR-Intron.txt" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t571194\t571201\t58\tNC_035780.1\t571194\t572676\r\n", "NC_035780.1\t1885000\t1885101\t50\tNC_035780.1\t1884754\t1886042\r\n", "NC_035780.1\t1933500\t1933573\t53\tNC_035780.1\t1932876\t1933573\r\n", "NC_035780.1\t28563400\t28563501\t61\tNC_035780.1\t28563399\t28564615\r\n", "NC_035780.1\t38236400\t38236501\t50\tNC_035780.1\t38236121\t38236506\r\n", "NC_035781.1\t5386493\t5386501\t51\tNC_035781.1\t5386493\t5386634\r\n", "NC_035781.1\t24474500\t24474586\t53\tNC_035781.1\t24473564\t24474586\r\n", "NC_035781.1\t43942600\t43942701\t52\tNC_035781.1\t43940334\t43944055\r\n", "NC_035781.1\t45110157\t45110201\t71\tNC_035781.1\t45110157\t45110508\r\n", "NC_035781.1\t53358700\t53358801\t56\tNC_035781.1\t53358683\t53358814\r\n" ] } ], "source": [ "!head 2019-06-05-HyperDMR-Intron.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypomethylated DMR" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 24\n", "hypoDMR overlaps with introns\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hypoDMR} \\\n", "-b {intronList} \\\n", "| wc -l\n", "!echo \"hypoDMR overlaps with introns\"" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hypoDMR} \\\n", "-b {intronList} \\\n", "> 2019-06-05-HypoDMR-Intron.txt" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t2538955\t2539001\t-50\tNC_035780.1\t2538955\t2541768\r\n", "NC_035780.1\t31302900\t31303001\t-60\tNC_035780.1\t31302841\t31303151\r\n", "NC_035780.1\t35969100\t35969201\t-53\tNC_035780.1\t35969070\t35986498\r\n", "NC_035781.1\t7626546\t7626601\t-56\tNC_035781.1\t7626546\t7626816\r\n", "NC_035781.1\t7626546\t7626601\t-56\tNC_035781.1\t7626546\t7626816\r\n", "NC_035781.1\t13281010\t13281101\t-57\tNC_035781.1\t13281010\t13281749\r\n", "NC_035781.1\t43054100\t43054148\t-60\tNC_035781.1\t43054064\t43054148\r\n", "NC_035781.1\t45110200\t45110301\t-51\tNC_035781.1\t45110157\t45110508\r\n", "NC_035781.1\t59605700\t59605801\t-54\tNC_035781.1\t59605698\t59605807\r\n", "NC_035783.1\t38039000\t38039049\t-51\tNC_035783.1\t38038913\t38039049\r\n" ] } ], "source": [ "!head 2019-06-05-HypoDMR-Intron.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DMR Background" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 93707\n", "DMR background overlaps with introns\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMRBackground} \\\n", "-b {intronList} \\\n", "| wc -l\n", "!echo \"DMR background overlaps with introns\"" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMRBackground} \\\n", "-b {intronList} \\\n", "> 2019-06-05-DMRBackground-Intron.txt" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t100501\t100553\t*\tNC_035780.1\t100122\t100553\r\n", "NC_035780.1\t100661\t100700\t*\tNC_035780.1\t100661\t104928\r\n", "NC_035780.1\t103201\t103300\t*\tNC_035780.1\t100661\t104928\r\n", "NC_035780.1\t250608\t250700\t*\tNC_035780.1\t250608\t252746\r\n", "NC_035780.1\t250701\t250800\t*\tNC_035780.1\t250608\t252746\r\n", "NC_035780.1\t259494\t259500\t*\tNC_035780.1\t259494\t261477\r\n", "NC_035780.1\t259801\t259900\t*\tNC_035780.1\t259494\t261477\r\n", "NC_035780.1\t260001\t260100\t*\tNC_035780.1\t259494\t261477\r\n", "NC_035780.1\t260101\t260200\t*\tNC_035780.1\t259494\t261477\r\n", "NC_035780.1\t260401\t260500\t*\tNC_035780.1\t259494\t261477\r\n" ] } ], "source": [ "!head 2019-06-05-DMRBackground-Intron.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2c. Exon UTR" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DML" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 27\n", "DML overlaps with exon UTR\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMLlist} \\\n", "-b {exonUTR} \\\n", "| wc -l\n", "!echo \"DML overlaps with exon UTR\"" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMLlist} \\\n", "-b {exonUTR} \\\n", "> 2019-05-29-DML-exonUTR.txt" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t2538924\t2538926\t-50\tNC_035780.1\tGnomon\texon\t2538681\t2538955\t.\t-\t.\tID=id1879;Parent=rna219;Dbxref=GeneID:111131682,Genbank:XM_022479355.1;gbkey=mRNA;gene=LOC111131682;product=oxysterol-binding protein-related protein 8-like%2C transcript variant X5;transcript_id=XM_022479355.1\r\n", "NC_035780.1\t17093218\t17093220\t-52\tNC_035780.1\tGnomon\texon\t17093143\t17093548\t.\t+\t.\tID=id17107;Parent=rna1785;Dbxref=GeneID:111102762,Genbank:XM_022435632.1;gbkey=mRNA;gene=LOC111102762;product=U1 small nuclear ribonucleoprotein A-like;transcript_id=XM_022435632.1\r\n", "NC_035780.1\t62782868\t62782870\t-50\tNC_035780.1\tGnomon\texon\t62782792\t62782981\t.\t+\t.\tID=id65638;Parent=gene3758;Dbxref=GeneID:111108434;gbkey=exon;gene=LOC111108434;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 91%25 coverage of the annotated genomic feature by RNAseq alignments\r\n", "NC_035781.1\t5386447\t5386449\t50\tNC_035781.1\tGnomon\texon\t5386310\t5386493\t.\t-\t.\tID=id76787;Parent=rna7337;Dbxref=GeneID:111120002,Genbank:XM_022460606.1;gbkey=mRNA;gene=LOC111120002;product=EF-hand calcium-binding domain-containing protein 14-like%2C transcript variant X2;transcript_id=XM_022460606.1\r\n", "NC_035781.1\t5386447\t5386449\t50\tNC_035781.1\tGnomon\texon\t5386310\t5386493\t.\t-\t.\tID=id76803;Parent=rna7338;Dbxref=GeneID:111120002,Genbank:XM_022460605.1;gbkey=mRNA;gene=LOC111120002;product=EF-hand calcium-binding domain-containing protein 14-like%2C transcript variant X1;transcript_id=XM_022460605.1\r\n", "NC_035782.1\t3352881\t3352883\t56\tNC_035782.1\tGnomon\texon\t3351009\t3353057\t.\t-\t.\tID=id153000;Parent=rna13828;Dbxref=GeneID:111124535,Genbank:XM_022467472.1;gbkey=mRNA;gene=LOC111124535;product=UDP-glucose 6-dehydrogenase-like%2C transcript variant X2;transcript_id=XM_022467472.1\r\n", "NC_035782.1\t3352881\t3352883\t56\tNC_035782.1\tGnomon\texon\t3351009\t3353057\t.\t-\t.\tID=id152976;Parent=rna13826;Dbxref=GeneID:111124535,Genbank:XM_022467471.1;gbkey=mRNA;gene=LOC111124535;product=UDP-glucose 6-dehydrogenase-like%2C transcript variant X1;transcript_id=XM_022467471.1\r\n", "NC_035782.1\t3352881\t3352883\t56\tNC_035782.1\tGnomon\texon\t3351009\t3353057\t.\t-\t.\tID=id152964;Parent=rna13825;Dbxref=GeneID:111124535,Genbank:XM_022467475.1;gbkey=mRNA;gene=LOC111124535;product=UDP-glucose 6-dehydrogenase-like%2C transcript variant X5;transcript_id=XM_022467475.1\r\n", "NC_035782.1\t3352881\t3352883\t56\tNC_035782.1\tGnomon\texon\t3351009\t3353057\t.\t-\t.\tID=id152939;Parent=rna13823;Dbxref=GeneID:111124535,Genbank:XM_022467474.1;gbkey=mRNA;gene=LOC111124535;product=UDP-glucose 6-dehydrogenase-like%2C transcript variant X4;transcript_id=XM_022467474.1\r\n", "NC_035782.1\t3352881\t3352883\t56\tNC_035782.1\tGnomon\texon\t3351009\t3353057\t.\t-\t.\tID=id152988;Parent=rna13827;Dbxref=GeneID:111124535,Genbank:XM_022467473.1;gbkey=mRNA;gene=LOC111124535;product=UDP-glucose 6-dehydrogenase-like%2C transcript variant X3;transcript_id=XM_022467473.1\r\n" ] } ], "source": [ "!head 2019-05-29-DML-exonUTR.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2d. mRNA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DML" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 549\n", "DML overlaps with mRNA\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMLlist} \\\n", "-b {mRNAList} \\\n", "| wc -l\n", "!echo \"DML overlaps with mRNA\"" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMLlist} \\\n", "-b {exonUTR} \\\n", "> 2019-05-29-DML-mRNA.txt" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t2538924\t2538926\t-50\tNC_035780.1\tGnomon\texon\t2538681\t2538955\t.\t-\t.\tID=id1879;Parent=rna219;Dbxref=GeneID:111131682,Genbank:XM_022479355.1;gbkey=mRNA;gene=LOC111131682;product=oxysterol-binding protein-related protein 8-like%2C transcript variant X5;transcript_id=XM_022479355.1\r\n", "NC_035780.1\t17093218\t17093220\t-52\tNC_035780.1\tGnomon\texon\t17093143\t17093548\t.\t+\t.\tID=id17107;Parent=rna1785;Dbxref=GeneID:111102762,Genbank:XM_022435632.1;gbkey=mRNA;gene=LOC111102762;product=U1 small nuclear ribonucleoprotein A-like;transcript_id=XM_022435632.1\r\n", "NC_035780.1\t62782868\t62782870\t-50\tNC_035780.1\tGnomon\texon\t62782792\t62782981\t.\t+\t.\tID=id65638;Parent=gene3758;Dbxref=GeneID:111108434;gbkey=exon;gene=LOC111108434;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 91%25 coverage of the annotated genomic feature by RNAseq alignments\r\n", "NC_035781.1\t5386447\t5386449\t50\tNC_035781.1\tGnomon\texon\t5386310\t5386493\t.\t-\t.\tID=id76787;Parent=rna7337;Dbxref=GeneID:111120002,Genbank:XM_022460606.1;gbkey=mRNA;gene=LOC111120002;product=EF-hand calcium-binding domain-containing protein 14-like%2C transcript variant X2;transcript_id=XM_022460606.1\r\n", "NC_035781.1\t5386447\t5386449\t50\tNC_035781.1\tGnomon\texon\t5386310\t5386493\t.\t-\t.\tID=id76803;Parent=rna7338;Dbxref=GeneID:111120002,Genbank:XM_022460605.1;gbkey=mRNA;gene=LOC111120002;product=EF-hand calcium-binding domain-containing protein 14-like%2C transcript variant X1;transcript_id=XM_022460605.1\r\n", "NC_035782.1\t3352881\t3352883\t56\tNC_035782.1\tGnomon\texon\t3351009\t3353057\t.\t-\t.\tID=id153000;Parent=rna13828;Dbxref=GeneID:111124535,Genbank:XM_022467472.1;gbkey=mRNA;gene=LOC111124535;product=UDP-glucose 6-dehydrogenase-like%2C transcript variant X2;transcript_id=XM_022467472.1\r\n", "NC_035782.1\t3352881\t3352883\t56\tNC_035782.1\tGnomon\texon\t3351009\t3353057\t.\t-\t.\tID=id152976;Parent=rna13826;Dbxref=GeneID:111124535,Genbank:XM_022467471.1;gbkey=mRNA;gene=LOC111124535;product=UDP-glucose 6-dehydrogenase-like%2C transcript variant X1;transcript_id=XM_022467471.1\r\n", "NC_035782.1\t3352881\t3352883\t56\tNC_035782.1\tGnomon\texon\t3351009\t3353057\t.\t-\t.\tID=id152964;Parent=rna13825;Dbxref=GeneID:111124535,Genbank:XM_022467475.1;gbkey=mRNA;gene=LOC111124535;product=UDP-glucose 6-dehydrogenase-like%2C transcript variant X5;transcript_id=XM_022467475.1\r\n", "NC_035782.1\t3352881\t3352883\t56\tNC_035782.1\tGnomon\texon\t3351009\t3353057\t.\t-\t.\tID=id152939;Parent=rna13823;Dbxref=GeneID:111124535,Genbank:XM_022467474.1;gbkey=mRNA;gene=LOC111124535;product=UDP-glucose 6-dehydrogenase-like%2C transcript variant X4;transcript_id=XM_022467474.1\r\n", "NC_035782.1\t3352881\t3352883\t56\tNC_035782.1\tGnomon\texon\t3351009\t3353057\t.\t-\t.\tID=id152988;Parent=rna13827;Dbxref=GeneID:111124535,Genbank:XM_022467473.1;gbkey=mRNA;gene=LOC111124535;product=UDP-glucose 6-dehydrogenase-like%2C transcript variant X3;transcript_id=XM_022467473.1\r\n" ] } ], "source": [ "!head 2019-05-29-DML-mRNA.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2e. Coding sequences" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DML" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 341\n", "DML overlaps with CDS\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMLlist} \\\n", "-b {CDS} \\\n", "| wc -l\n", "!echo \"DML overlaps with CDS\"" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMLlist} \\\n", "-b {CDS} \\\n", "> 2019-05-29-DML-CDS.txt" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t571138\t571140\t58\tNC_035780.1\t570942\t571194\r\n", "NC_035780.1\t2586508\t2586510\t-53\tNC_035780.1\t2586438\t2586557\r\n", "NC_035780.1\t2589720\t2589722\t57\tNC_035780.1\t2589716\t2589955\r\n", "NC_035780.1\t4286286\t4286288\t67\tNC_035780.1\t4286174\t4286407\r\n", "NC_035780.1\t4286802\t4286804\t-62\tNC_035780.1\t4286783\t4286927\r\n", "NC_035780.1\t4289628\t4289630\t-52\tNC_035780.1\t4288592\t4290756\r\n", "NC_035780.1\t8693287\t8693289\t-52\tNC_035780.1\t8693228\t8693320\r\n", "NC_035780.1\t9110274\t9110276\t-63\tNC_035780.1\t9109982\t9110377\r\n", "NC_035780.1\t12631453\t12631455\t60\tNC_035780.1\t12631096\t12631487\r\n", "NC_035780.1\t12631453\t12631455\t60\tNC_035780.1\t12631096\t12631487\r\n" ] } ], "source": [ "!head 2019-05-29-DML-CDS.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2f. Non-coding sequences" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DML" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 230\n", "DML overlaps with nonCDS\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMLlist} \\\n", "-b {nonCDS} \\\n", "| wc -l\n", "!echo \"DML overlaps with nonCDS\"" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMLlist} \\\n", "-b {nonCDS} \\\n", "> 2019-05-29-DML-nonCDS.txt" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t401630\t401632\t53\tNC_035780.1\t401604\t401800\r\n", "NC_035780.1\t1882691\t1882693\t64\tNC_035780.1\t1882355\t1882971\r\n", "NC_035780.1\t1885022\t1885024\t61\tNC_035780.1\t1884754\t1886042\r\n", "NC_035780.1\t1933499\t1933501\t51\tNC_035780.1\t1932876\t1933573\r\n", "NC_035780.1\t2541726\t2541728\t-54\tNC_035780.1\t2538955\t2541768\r\n", "NC_035780.1\t2584492\t2584494\t56\tNC_035780.1\t2584153\t2584504\r\n", "NC_035780.1\t4288213\t4288215\t-58\tNC_035780.1\t4288128\t4288230\r\n", "NC_035780.1\t8833124\t8833126\t60\tNC_035780.1\t8832171\t8833699\r\n", "NC_035780.1\t17488958\t17488960\t-57\tNC_035780.1\t17488942\t17489178\r\n", "NC_035780.1\t22177828\t22177830\t-51\tNC_035780.1\t22154686\t22178240\r\n" ] } ], "source": [ "!head 2019-05-29-DML-nonCDS.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2g. Genes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DML" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 560\n", "DML overlaps with genes\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMLlist} \\\n", "-b {geneList} \\\n", "| wc -l\n", "!echo \"DML overlaps with genes\"" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMLlist} \\\n", "-b {geneList} \\\n", "> 2019-05-29-DML-Genes.txt" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t401630\t401632\t53\tNC_035780.1\t394983\t409280\r\n", "NC_035780.1\t571138\t571140\t58\tNC_035780.1\t544088\t573497\r\n", "NC_035780.1\t1882691\t1882693\t64\tNC_035780.1\t1882143\t1890106\r\n", "NC_035780.1\t1885022\t1885024\t61\tNC_035780.1\t1882143\t1890106\r\n", "NC_035780.1\t1933499\t1933501\t51\tNC_035780.1\t1928718\t1940217\r\n", "NC_035780.1\t2538924\t2538926\t-50\tNC_035780.1\t2524425\t2553408\r\n", "NC_035780.1\t2541726\t2541728\t-54\tNC_035780.1\t2524425\t2553408\r\n", "NC_035780.1\t2584492\t2584494\t56\tNC_035780.1\t2554181\t2599559\r\n", "NC_035780.1\t2586508\t2586510\t-53\tNC_035780.1\t2554181\t2599559\r\n", "NC_035780.1\t2589720\t2589722\t57\tNC_035780.1\t2554181\t2599559\r\n" ] } ], "source": [ "!head 2019-05-29-DML-Genes.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I know how many overlaps there are, but I also want to know how many unique genes have DMLs in them. For this, I will use the following code:\n", "\n", "`cut -f7 2019-05-29-DML-Genes.txt | sort | uniq -c`\n", "\n", "`cut` is the command that isolates the column information. Each gene has a unique end position, so I'll look at unique entries in the seventh column (`-f7`). The column is piped into `sort`, then that output is counted for unique lines by `uniq`. Finally, I'll pipe this into `wc -l` to count the number of unique genes." ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 481\n", "unique genes overlapping with DML\n" ] } ], "source": [ "! cut -f7 2019-05-29-DML-Genes.txt | sort | uniq -c | wc -l\n", "!echo \"unique genes overlapping with DML\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypermethylated DML" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 289\n", "hypermethylated DML overlaps with genes\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hyperDML} \\\n", "-b {geneList} \\\n", "| wc -l\n", "!echo \"hypermethylated DML overlaps with genes\"" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hyperDML} \\\n", "-b {geneList} \\\n", "> 2019-05-29-Hypermethylated-DML-Genes.txt" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t401630\t401632\t53\tNC_035780.1\t394983\t409280\r\n", "NC_035780.1\t571138\t571140\t58\tNC_035780.1\t544088\t573497\r\n", "NC_035780.1\t1882691\t1882693\t64\tNC_035780.1\t1882143\t1890106\r\n", "NC_035780.1\t1885022\t1885024\t61\tNC_035780.1\t1882143\t1890106\r\n", "NC_035780.1\t1933499\t1933501\t51\tNC_035780.1\t1928718\t1940217\r\n", "NC_035780.1\t2584492\t2584494\t56\tNC_035780.1\t2554181\t2599559\r\n", "NC_035780.1\t2589720\t2589722\t57\tNC_035780.1\t2554181\t2599559\r\n", "NC_035780.1\t4286286\t4286288\t67\tNC_035780.1\t4282771\t4298209\r\n", "NC_035780.1\t8833124\t8833126\t60\tNC_035780.1\t8829533\t8833841\r\n", "NC_035780.1\t12631453\t12631455\t60\tNC_035780.1\t12630576\t12697104\r\n" ] } ], "source": [ "!head 2019-05-29-Hypermethylated-DML-Genes.txt" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 269\n", "unique genes overlapping with hypermethylated DML\n" ] } ], "source": [ "! cut -f7 2019-05-29-Hypermethylated-DML-Genes.txt | sort | uniq -c | wc -l\n", "!echo \"unique genes overlapping with hypermethylated DML\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypomethylated DML" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 271\n", "hypomethylated DML overlaps with genes\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hypoDML} \\\n", "-b {geneList} \\\n", "| wc -l\n", "!echo \"hypomethylated DML overlaps with genes\"" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hypoDML} \\\n", "-b {geneList} \\\n", "> 2019-05-29-Hypomethylated-DML-mRNA.txt" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t2538924\t2538926\t-50\tNC_035780.1\t2524425\t2553408\r\n", "NC_035780.1\t2541726\t2541728\t-54\tNC_035780.1\t2524425\t2553408\r\n", "NC_035780.1\t2586508\t2586510\t-53\tNC_035780.1\t2554181\t2599559\r\n", "NC_035780.1\t4286802\t4286804\t-62\tNC_035780.1\t4282771\t4298209\r\n", "NC_035780.1\t4288213\t4288215\t-58\tNC_035780.1\t4282771\t4298209\r\n", "NC_035780.1\t4289628\t4289630\t-52\tNC_035780.1\t4282771\t4298209\r\n", "NC_035780.1\t8693287\t8693289\t-52\tNC_035780.1\t8692509\t8698183\r\n", "NC_035780.1\t9110274\t9110276\t-63\tNC_035780.1\t9103662\t9111843\r\n", "NC_035780.1\t17093218\t17093220\t-52\tNC_035780.1\t17089706\t17093548\r\n", "NC_035780.1\t17488958\t17488960\t-57\tNC_035780.1\t17457431\t17541765\r\n" ] } ], "source": [ "!head 2019-05-29-Hypomethylated-DML-mRNA.txt" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 241\n", "unique genes overlapping with hypomethylated DML\n" ] } ], "source": [ "! cut -f7 2019-05-29-Hypomethylated-DML-mRNA.txt | sort | uniq -c | wc -l\n", "!echo \"unique genes overlapping with hypomethylated DML\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DMR" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 66\n", "DMR overlaps with genes\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMRlist} \\\n", "-b {geneList} \\\n", "| wc -l\n", "!echo \"DMR overlaps with genes\"" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMRlist} \\\n", "-b {geneList} \\\n", "> 2019-06-05-DMR-Genes.txt" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t571100\t571200\tDMR\t58\tNC_035780.1\t544088\t573497\r\n", "NC_035780.1\t1885000\t1885100\tDMR\t50\tNC_035780.1\t1882143\t1890106\r\n", "NC_035780.1\t1933500\t1933600\tDMR\t53\tNC_035780.1\t1928718\t1940217\r\n", "NC_035780.1\t2538900\t2539000\tDMR\t-50\tNC_035780.1\t2524425\t2553408\r\n", "NC_035780.1\t22276700\t22276800\tDMR\t56\tNC_035780.1\t22269635\t22278631\r\n", "NC_035780.1\t28563400\t28563500\tDMR\t61\tNC_035780.1\t28552157\t28576101\r\n", "NC_035780.1\t31302900\t31303000\tDMR\t-60\tNC_035780.1\t31295876\t31307973\r\n", "NC_035780.1\t35969100\t35969200\tDMR\t-53\tNC_035780.1\t35960923\t35999467\r\n", "NC_035780.1\t38236400\t38236500\tDMR\t50\tNC_035780.1\t38209799\t38243110\r\n", "NC_035781.1\t5386400\t5386500\tDMR\t51\tNC_035781.1\t5383711\t5397505\r\n" ] } ], "source": [ "!head 2019-06-05-DMR-Genes.txt" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 65\n", "DMR overlaps with unique genes\n" ] } ], "source": [ "! cut -f8 2019-06-05-DMR-Genes.txt | sort | uniq -c | wc -l\n", "!echo \"DMR overlaps with unique genes\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypermethylated DMR" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 33\n", "hyperDMR overlaps with genes\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hyperDMR} \\\n", "-b {geneList} \\\n", "| wc -l\n", "!echo \"hyperDMR overlaps with genes\"" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hyperDMR} \\\n", "-b {geneList} \\\n", "> 2019-06-05-HyperDMR-Genes.txt" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t571100\t571201\t58\tNC_035780.1\t544088\t573497\r\n", "NC_035780.1\t1885000\t1885101\t50\tNC_035780.1\t1882143\t1890106\r\n", "NC_035780.1\t1933500\t1933601\t53\tNC_035780.1\t1928718\t1940217\r\n", "NC_035780.1\t22276700\t22276801\t56\tNC_035780.1\t22269635\t22278631\r\n", "NC_035780.1\t28563400\t28563501\t61\tNC_035780.1\t28552157\t28576101\r\n", "NC_035780.1\t38236400\t38236501\t50\tNC_035780.1\t38209799\t38243110\r\n", "NC_035781.1\t5386400\t5386501\t51\tNC_035781.1\t5383711\t5397505\r\n", "NC_035781.1\t24474500\t24474601\t53\tNC_035781.1\t24468785\t24491957\r\n", "NC_035781.1\t43942600\t43942701\t52\tNC_035781.1\t43936785\t43944143\r\n", "NC_035781.1\t45110100\t45110201\t71\tNC_035781.1\t45108521\t45113815\r\n" ] } ], "source": [ "!head 2019-06-05-HyperDMR-Genes.txt" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 33\n", "hyperDMR overlaps with unique genes\n" ] } ], "source": [ "! cut -f7 2019-06-05-HyperDMR-Genes.txt | sort | uniq -c | wc -l\n", "!echo \"hyperDMR overlaps with unique genes\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypomethylated DMR" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 33\n", "hypoDMR overlaps with genes\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hypoDMR} \\\n", "-b {geneList} \\\n", "| wc -l\n", "!echo \"hypoDMR overlaps with genes\"" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hypoDMR} \\\n", "-b {geneList} \\\n", "> 2019-06-05-HypoDMR-Genes.txt" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t2538900\t2539001\t-50\tNC_035780.1\t2524425\t2553408\r\n", "NC_035780.1\t31302900\t31303001\t-60\tNC_035780.1\t31295876\t31307973\r\n", "NC_035780.1\t35969100\t35969201\t-53\tNC_035780.1\t35960923\t35999467\r\n", "NC_035781.1\t7626500\t7626601\t-56\tNC_035781.1\t7589782\t7641768\r\n", "NC_035781.1\t7626500\t7626601\t-56\tNC_035781.1\t7590843\t7626904\r\n", "NC_035781.1\t13281000\t13281101\t-57\tNC_035781.1\t13268605\t13286933\r\n", "NC_035781.1\t20126000\t20126101\t-52\tNC_035781.1\t20125441\t20128033\r\n", "NC_035781.1\t30789600\t30789701\t-57\tNC_035781.1\t30781806\t30790310\r\n", "NC_035781.1\t43054100\t43054201\t-60\tNC_035781.1\t43048000\t43060502\r\n", "NC_035781.1\t45110200\t45110301\t-51\tNC_035781.1\t45108521\t45113815\r\n" ] } ], "source": [ "!head 2019-06-05-HypoDMR-Genes.txt" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 33\n", "hyperDMR overlaps with unique genes\n" ] } ], "source": [ "! cut -f7 2019-06-05-HypoDMR-Genes.txt | sort | uniq -c | wc -l\n", "!echo \"hyperDMR overlaps with unique genes\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DMR Background" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 142153\n", "DMR background overlaps with genes\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMRBackground} \\\n", "-b {geneList} \\\n", "| wc -l\n", "!echo \"DMR background overlaps with genes\"" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMRBackground} \\\n", "-b {geneList} \\\n", "> 2019-06-05-DMRBackground-Genes.txt" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t100501\t100600\t*\tNC_035780.1\t99840\t106460\r\n", "NC_035780.1\t100601\t100700\t*\tNC_035780.1\t99840\t106460\r\n", "NC_035780.1\t103201\t103300\t*\tNC_035780.1\t99840\t106460\r\n", "NC_035780.1\t250301\t250400\t*\tNC_035780.1\t245532\t253042\r\n", "NC_035780.1\t250401\t250500\t*\tNC_035780.1\t245532\t253042\r\n", "NC_035780.1\t250501\t250600\t*\tNC_035780.1\t245532\t253042\r\n", "NC_035780.1\t250601\t250700\t*\tNC_035780.1\t245532\t253042\r\n", "NC_035780.1\t250701\t250800\t*\tNC_035780.1\t245532\t253042\r\n", "NC_035780.1\t258108\t258200\t*\tNC_035780.1\t258108\t272839\r\n", "NC_035780.1\t258201\t258300\t*\tNC_035780.1\t258108\t272839\r\n" ] } ], "source": [ "!head 2019-06-05-DMRBackground-Genes.txt" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 11578\n", "DMR background overlaps with unique genes\n" ] } ], "source": [ "! cut -f7 2019-06-05-DMRBackground-Genes.txt | sort | uniq -c | wc -l\n", "!echo \"DMR background overlaps with unique genes\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2h. Transposable Elements (All)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DML" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 57\n", "DML overlaps with transposable elements (all)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMLlist} \\\n", "-b {transposableElementsAll} \\\n", "| wc -l\n", "!echo \"DML overlaps with transposable elements (all)\"" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMLlist} \\\n", "-b {transposableElementsAll} \\\n", "> 2019-05-29-DML-TE-all.txt" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t8833124\t8833126\t60\tNC_035780.1\tRepeatMasker\tsimilarity\t8833042\t8833288\t18.2\t-\t.\tTarget \"Motif:CVA\" 1 272\r\n", "NC_035780.1\t22177828\t22177830\t-51\tNC_035780.1\tRepeatMasker\tsimilarity\t22177766\t22177877\t22.3\t-\t.\tTarget \"Motif:DNA9-6_CGi\" 1 115\r\n", "NC_035780.1\t57337100\t57337102\t-54\tNC_035780.1\tRepeatMasker\tsimilarity\t57337042\t57337128\t18.6\t-\t.\tTarget \"Motif:DNA2-2_CGi\" 413 498\r\n", "NC_035780.1\t58135767\t58135769\t74\tNC_035780.1\tRepeatMasker\tsimilarity\t58135699\t58135837\t22.4\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 169 314\r\n", "NC_035781.1\t22439769\t22439771\t53\tNC_035781.1\tRepeatMasker\tsimilarity\t22439740\t22439796\t28.1\t+\t.\tTarget \"Motif:Mariner-6_AMi\" 698 754\r\n", "NC_035781.1\t29178318\t29178320\t-55\tNC_035781.1\tRepeatMasker\tsimilarity\t29177336\t29178341\t16.0\t-\t.\tTarget \"Motif:CVA\" 2 863\r\n", "NC_035781.1\t54151548\t54151550\t54\tNC_035781.1\tRepeatMasker\tsimilarity\t54150482\t54151750\t14.3\t+\t.\tTarget \"Motif:CVA\" 1 1018\r\n", "NC_035781.1\t59742649\t59742651\t-65\tNC_035781.1\tRepeatMasker\tsimilarity\t59742603\t59742651\t 4.2\t+\t.\tTarget \"Motif:(ACTAACG)n\" 1 49\r\n", "NC_035782.1\t6685343\t6685345\t-68\tNC_035782.1\tRepeatMasker\tsimilarity\t6685308\t6685646\t15.0\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 335\r\n", "NC_035782.1\t6685349\t6685351\t-50\tNC_035782.1\tRepeatMasker\tsimilarity\t6685308\t6685646\t15.0\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 335\r\n" ] } ], "source": [ "!head 2019-05-29-DML-TE-all.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypermethylated DML" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 26\n", "hypermethylated DML overlaps with TE (all)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hyperDML} \\\n", "-b {transposableElementsAll} \\\n", "| wc -l\n", "!echo \"hypermethylated DML overlaps with TE (all)\"" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hyperDML} \\\n", "-b {transposableElementsAll} \\\n", "> 2019-05-29-Hypermethylated-DML-TEall.txt" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t8833124\t8833126\t60\tNC_035780.1\tRepeatMasker\tsimilarity\t8833042\t8833288\t18.2\t-\t.\tTarget \"Motif:CVA\" 1 272\r\n", "NC_035780.1\t58135767\t58135769\t74\tNC_035780.1\tRepeatMasker\tsimilarity\t58135699\t58135837\t22.4\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 169 314\r\n", "NC_035781.1\t22439769\t22439771\t53\tNC_035781.1\tRepeatMasker\tsimilarity\t22439740\t22439796\t28.1\t+\t.\tTarget \"Motif:Mariner-6_AMi\" 698 754\r\n", "NC_035781.1\t54151548\t54151550\t54\tNC_035781.1\tRepeatMasker\tsimilarity\t54150482\t54151750\t14.3\t+\t.\tTarget \"Motif:CVA\" 1 1018\r\n", "NC_035782.1\t45857195\t45857197\t52\tNC_035782.1\tRepeatMasker\tsimilarity\t45857026\t45858123\t32.7\t+\t.\tTarget \"Motif:Mariner-21_LCh\" 450 1999\r\n", "NC_035782.1\t53693367\t53693369\t61\tNC_035782.1\tRepeatMasker\tsimilarity\t53693299\t53693466\t19.6\t+\t.\tTarget \"Motif:Crypton-N6B_CGi\" 566 735\r\n", "NC_035782.1\t58675269\t58675271\t50\tNC_035782.1\tRepeatMasker\tsimilarity\t58675249\t58675337\t19.1\t-\t.\tTarget \"Motif:DNA3-12_CGi\" 290 378\r\n", "NC_035782.1\t61203970\t61203972\t51\tNC_035782.1\tRepeatMasker\tsimilarity\t61203541\t61204351\t12.3\t-\t.\tTarget \"Motif:CVA\" 1 686\r\n", "NC_035783.1\t4336100\t4336102\t63\tNC_035783.1\tRepeatMasker\tsimilarity\t4335884\t4336135\t21.8\t+\t.\tTarget \"Motif:DNA8-4_CGi\" 42 268\r\n", "NC_035783.1\t23130125\t23130127\t53\tNC_035783.1\tRepeatMasker\tsimilarity\t23130086\t23130209\t18.6\t+\t.\tTarget \"Motif:Crypton-8N1_CGi\" 516 639\r\n" ] } ], "source": [ "!head 2019-05-29-Hypermethylated-DML-TEall.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypomethylated DML" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 31\n", "hypomethylated DML overlaps with TE (all)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hypoDML} \\\n", "-b {transposableElementsAll} \\\n", "| wc -l\n", "!echo \"hypomethylated DML overlaps with TE (all)\"" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hypoDML} \\\n", "-b {transposableElementsAll} \\\n", "> 2019-05-29-Hypomethylated-DML-TEall.txt" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t22177828\t22177830\t-51\tNC_035780.1\tRepeatMasker\tsimilarity\t22177766\t22177877\t22.3\t-\t.\tTarget \"Motif:DNA9-6_CGi\" 1 115\r\n", "NC_035780.1\t57337100\t57337102\t-54\tNC_035780.1\tRepeatMasker\tsimilarity\t57337042\t57337128\t18.6\t-\t.\tTarget \"Motif:DNA2-2_CGi\" 413 498\r\n", "NC_035781.1\t29178318\t29178320\t-55\tNC_035781.1\tRepeatMasker\tsimilarity\t29177336\t29178341\t16.0\t-\t.\tTarget \"Motif:CVA\" 2 863\r\n", "NC_035781.1\t59742649\t59742651\t-65\tNC_035781.1\tRepeatMasker\tsimilarity\t59742603\t59742651\t 4.2\t+\t.\tTarget \"Motif:(ACTAACG)n\" 1 49\r\n", "NC_035782.1\t6685343\t6685345\t-68\tNC_035782.1\tRepeatMasker\tsimilarity\t6685308\t6685646\t15.0\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 335\r\n", "NC_035782.1\t6685349\t6685351\t-50\tNC_035782.1\tRepeatMasker\tsimilarity\t6685308\t6685646\t15.0\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 335\r\n", "NC_035782.1\t34498893\t34498895\t-55\tNC_035782.1\tRepeatMasker\tsimilarity\t34498501\t34500091\t24.8\t+\t.\tTarget \"Motif:Helitron-N40_CGi\" 1 1569\r\n", "NC_035782.1\t34498895\t34498897\t-71\tNC_035782.1\tRepeatMasker\tsimilarity\t34498501\t34500091\t24.8\t+\t.\tTarget \"Motif:Helitron-N40_CGi\" 1 1569\r\n", "NC_035783.1\t32417238\t32417240\t-60\tNC_035783.1\tRepeatMasker\tsimilarity\t32417040\t32417268\t14.7\t-\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 2 319\r\n", "NC_035783.1\t41484688\t41484690\t-59\tNC_035783.1\tRepeatMasker\tsimilarity\t41484586\t41484823\t11.8\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 61 337\r\n" ] } ], "source": [ "!head 2019-05-29-Hypomethylated-DML-TEall.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DMR" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 11\n", "DMR overlaps with transposable elements (all)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMRlist} \\\n", "-b {transposableElementsAll} \\\n", "| wc -l\n", "!echo \"DMR overlaps with transposable elements (all)\"" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMRlist} \\\n", "-b {transposableElementsAll} \\\n", "> 2019-06-05-DMR-TE-all.txt" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035781.1\t54151500\t54151600\tDMR\t54\tNC_035781.1\tRepeatMasker\tsimilarity\t54150482\t54151750\t14.3\t+\t.\tTarget \"Motif:CVA\" 1 1018\r\n", "NC_035783.1\t30386535\t30386600\tDMR\t52\tNC_035783.1\tRepeatMasker\tsimilarity\t30386536\t30387049\t25.2\t+\t.\tTarget \"Motif:Kolobok-N4_CGi\" 1 497\r\n", "NC_035784.1\t41345100\t41345105\tDMR\t-51\tNC_035784.1\tRepeatMasker\tsimilarity\t41345048\t41345105\t 6.9\t+\t.\tTarget \"Motif:Helitron-10_CGi\" 282 358\r\n", "NC_035784.1\t41345184\t41345200\tDMR\t-51\tNC_035784.1\tRepeatMasker\tsimilarity\t41345185\t41345249\t20.3\t-\t.\tTarget \"Motif:Helitron-N42_CGi\" 1 65\r\n", "NC_035784.1\t57163819\t57163900\tDMR\t-63\tNC_035784.1\tRepeatMasker\tsimilarity\t57163820\t57163967\t27.4\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 179 325\r\n", "NC_035784.1\t86309411\t86309430\tDMR\t-50\tNC_035784.1\tRepeatMasker\tsimilarity\t86309412\t86309430\t 5.5\t+\t.\tTarget \"Motif:(C)n\" 1 19\r\n", "NC_035785.1\t35798179\t35798200\tDMR\t-52\tNC_035785.1\tRepeatMasker\tsimilarity\t35798180\t35798247\t31.6\t+\t.\tTarget \"Motif:GA-rich\" 1 68\r\n", "NC_035787.1\t47281023\t47281063\tDMR\t50\tNC_035787.1\tRepeatMasker\tsimilarity\t47281024\t47281063\t17.5\t-\t.\tTarget \"Motif:DNA9-6_CGi\" 758 797\r\n", "NC_035787.1\t47281063\t47281100\tDMR\t50\tNC_035787.1\tRepeatMasker\tsimilarity\t47281064\t47281118\t22.2\t+\t.\tTarget \"Motif:DNA9-6_CGi\" 744 797\r\n", "NC_035787.1\t52112659\t52112681\tDMR\t-53\tNC_035787.1\tRepeatMasker\tsimilarity\t52112660\t52112681\t 0.0\t+\t.\tTarget \"Motif:(G)n\" 1 22\r\n" ] } ], "source": [ "!head 2019-06-05-DMR-TE-all.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypermethylated DMR" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 3\n", "hyperDMR overlaps with transposable elements (all)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hyperDMR} \\\n", "-b {transposableElementsAll} \\\n", "| wc -l\n", "!echo \"hyperDMR overlaps with transposable elements (all)\"" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hyperDMR} \\\n", "-b {transposableElementsAll} \\\n", "> 2019-06-05-HyperDMR-TE-all.txt" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035781.1\t54151500\t54151601\t54\tNC_035781.1\tRepeatMasker\tsimilarity\t54150482\t54151750\t14.3\t+\t.\tTarget \"Motif:CVA\" 1 1018\r\n", "NC_035783.1\t30386535\t30386601\t52\tNC_035783.1\tRepeatMasker\tsimilarity\t30386536\t30387049\t25.2\t+\t.\tTarget \"Motif:Kolobok-N4_CGi\" 1 497\r\n", "NC_035787.1\t47281023\t47281063\t50\tNC_035787.1\tRepeatMasker\tsimilarity\t47281024\t47281063\t17.5\t-\t.\tTarget \"Motif:DNA9-6_CGi\" 758 797\r\n", "NC_035787.1\t47281063\t47281101\t50\tNC_035787.1\tRepeatMasker\tsimilarity\t47281064\t47281118\t22.2\t+\t.\tTarget \"Motif:DNA9-6_CGi\" 744 797\r\n" ] } ], "source": [ "!head 2019-06-05-HyperDMR-TE-all.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypomethylated DMR" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 8\n", "hypoDMR overlaps with transposable elements (all)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hypoDMR} \\\n", "-b {transposableElementsAll} \\\n", "| wc -l\n", "!echo \"hypoDMR overlaps with transposable elements (all)\"" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hypoDMR} \\\n", "-b {transposableElementsAll} \\\n", "> 2019-06-05-HypoDMR-TE-all.txt" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035784.1\t41345100\t41345105\t-51\tNC_035784.1\tRepeatMasker\tsimilarity\t41345048\t41345105\t 6.9\t+\t.\tTarget \"Motif:Helitron-10_CGi\" 282 358\r\n", "NC_035784.1\t41345184\t41345201\t-51\tNC_035784.1\tRepeatMasker\tsimilarity\t41345185\t41345249\t20.3\t-\t.\tTarget \"Motif:Helitron-N42_CGi\" 1 65\r\n", "NC_035784.1\t57163819\t57163901\t-63\tNC_035784.1\tRepeatMasker\tsimilarity\t57163820\t57163967\t27.4\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 179 325\r\n", "NC_035784.1\t86309411\t86309430\t-50\tNC_035784.1\tRepeatMasker\tsimilarity\t86309412\t86309430\t 5.5\t+\t.\tTarget \"Motif:(C)n\" 1 19\r\n", "NC_035785.1\t35798179\t35798201\t-52\tNC_035785.1\tRepeatMasker\tsimilarity\t35798180\t35798247\t31.6\t+\t.\tTarget \"Motif:GA-rich\" 1 68\r\n", "NC_035787.1\t52112659\t52112681\t-53\tNC_035787.1\tRepeatMasker\tsimilarity\t52112660\t52112681\t 0.0\t+\t.\tTarget \"Motif:(G)n\" 1 22\r\n", "NC_035787.1\t61149800\t61149804\t-53\tNC_035787.1\tRepeatMasker\tsimilarity\t61149705\t61149804\t26.0\t-\t.\tTarget \"Motif:DNA2-22_CGi\" 1 103\r\n", "NC_035787.1\t61149807\t61149901\t-53\tNC_035787.1\tRepeatMasker\tsimilarity\t61149808\t61149996\t18.9\t+\t.\tTarget \"Motif:CVA\" 93 277\r\n", "NC_035788.1\t56052700\t56052777\t-73\tNC_035788.1\tRepeatMasker\tsimilarity\t56052660\t56052777\t21.4\t+\t.\tTarget \"Motif:MINIME_DN\" 321 430\r\n", "NC_035788.1\t56052700\t56052796\t-73\tNC_035788.1\tRepeatMasker\tsimilarity\t56052681\t56052796\t21.9\t+\t.\tTarget \"Motif:(TCTG)n\" 2 117\r\n" ] } ], "source": [ "!head 2019-06-05-HypoDMR-TE-all.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DMR Background" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 25117\n", "DMR background overlaps with transposable elements (all)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMRBackground} \\\n", "-b {transposableElementsAll} \\\n", "| wc -l\n", "!echo \"DMR background overlaps with transposable elements (all)\"" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMRBackground} \\\n", "-b {transposableElementsAll} \\\n", "> 2019-06-05-DMRBackground-TE-all.txt" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_007175.2\t601\t700\t*\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n", "NC_007175.2\t2201\t2300\t*\tNC_007175.2\tRepeatMasker\tsimilarity\t2129\t2367\t20.5\t-\t.\tTarget \"Motif:REP-6_LMi\" 13886 14118\r\n", "NC_007175.2\t5301\t5400\t*\tNC_007175.2\tRepeatMasker\tsimilarity\t5168\t5532\t32.9\t+\t.\tTarget \"Motif:REP-6_LMi\" 4620 4983\r\n", "NC_007175.2\t5401\t5500\t*\tNC_007175.2\tRepeatMasker\tsimilarity\t5168\t5532\t32.9\t+\t.\tTarget \"Motif:REP-6_LMi\" 4620 4983\r\n", "NC_007175.2\t5501\t5532\t*\tNC_007175.2\tRepeatMasker\tsimilarity\t5168\t5532\t32.9\t+\t.\tTarget \"Motif:REP-6_LMi\" 4620 4983\r\n", "NC_007175.2\t12301\t12368\t*\tNC_007175.2\tRepeatMasker\tsimilarity\t12086\t12368\t30.0\t-\t.\tTarget \"Motif:REP-6_LMi\" 9850 10131\r\n", "NC_007175.2\t16531\t16600\t*\tNC_007175.2\tRepeatMasker\tsimilarity\t16532\t16610\t24.1\t-\t.\tTarget \"Motif:REP-6_LMi\" 13114 13192\r\n", "NC_007175.2\t16601\t16610\t*\tNC_007175.2\tRepeatMasker\tsimilarity\t16532\t16610\t24.1\t-\t.\tTarget \"Motif:REP-6_LMi\" 13114 13192\r\n", "NC_035780.1\t1472\t1500\t*\tNC_035780.1\tRepeatMasker\tsimilarity\t1473\t1535\t 0.0\t+\t.\tTarget \"Motif:(TAACCC)n\" 1 63\r\n", "NC_035780.1\t255031\t255100\t*\tNC_035780.1\tRepeatMasker\tsimilarity\t255032\t255159\t14.5\t-\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 211 337\r\n" ] } ], "source": [ "!head 2019-06-05-DMRBackground-TE-all.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2i. Transposable Elements (_C. gigas_ only)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DML" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 39\n", "DML overlaps with transposable elements (Cg)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMLlist} \\\n", "-b {transposableElementsCg} \\\n", "| wc -l\n", "!echo \"DML overlaps with transposable elements (Cg)\"" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMLlist} \\\n", "-b {transposableElementsCg} \\\n", "> 2019-05-29-DML-TE-Cg.txt" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t8833124\t8833126\t60\tNC_035780.1\tRepeatMasker\tsimilarity\t8833045\t8833287\t22.6\t-\t.\tTarget \"Motif:Helitron-N2f_CGi\" 1 276\r\n", "NC_035780.1\t22177828\t22177830\t-51\tNC_035780.1\tRepeatMasker\tsimilarity\t22177766\t22177877\t22.3\t-\t.\tTarget \"Motif:DNA9-6_CGi\" 1 115\r\n", "NC_035780.1\t57337100\t57337102\t-54\tNC_035780.1\tRepeatMasker\tsimilarity\t57337042\t57337128\t18.6\t-\t.\tTarget \"Motif:DNA2-2_CGi\" 413 498\r\n", "NC_035781.1\t29178318\t29178320\t-55\tNC_035781.1\tRepeatMasker\tsimilarity\t29177333\t29178341\t24.4\t-\t.\tTarget \"Motif:Helitron-N2d_CGi\" 2 863\r\n", "NC_035781.1\t54151548\t54151550\t54\tNC_035781.1\tRepeatMasker\tsimilarity\t54150483\t54151741\t23.3\t+\t.\tTarget \"Motif:Helitron-N2f_CGi\" 1 1018\r\n", "NC_035781.1\t59742649\t59742651\t-65\tNC_035781.1\tRepeatMasker\tsimilarity\t59742603\t59742651\t 4.2\t+\t.\tTarget \"Motif:(ACTAACG)n\" 1 49\r\n", "NC_035782.1\t34498893\t34498895\t-55\tNC_035782.1\tRepeatMasker\tsimilarity\t34498501\t34500091\t24.8\t+\t.\tTarget \"Motif:Helitron-N40_CGi\" 1 1569\r\n", "NC_035782.1\t34498895\t34498897\t-71\tNC_035782.1\tRepeatMasker\tsimilarity\t34498501\t34500091\t24.8\t+\t.\tTarget \"Motif:Helitron-N40_CGi\" 1 1569\r\n", "NC_035782.1\t53693367\t53693369\t61\tNC_035782.1\tRepeatMasker\tsimilarity\t53693299\t53693466\t19.6\t+\t.\tTarget \"Motif:Crypton-N6B_CGi\" 566 735\r\n", "NC_035782.1\t58675269\t58675271\t50\tNC_035782.1\tRepeatMasker\tsimilarity\t58675249\t58675337\t19.1\t-\t.\tTarget \"Motif:DNA3-12_CGi\" 290 378\r\n" ] } ], "source": [ "!head 2019-05-29-DML-TE-Cg.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypermethylated DML" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 16\n", "hypermethylated DML overlaps with TE (Cg)\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hyperDML} \\\n", "-b {transposableElementsCg} \\\n", "| wc -l\n", "!echo \"hypermethylated DML overlaps with TE (Cg)\"" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hyperDML} \\\n", "-b {transposableElementsCg} \\\n", "> 2019-05-29-Hypermethylated-DML-TECg.txt" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t8833124\t8833126\t60\tNC_035780.1\tRepeatMasker\tsimilarity\t8833045\t8833287\t22.6\t-\t.\tTarget \"Motif:Helitron-N2f_CGi\" 1 276\r\n", "NC_035781.1\t54151548\t54151550\t54\tNC_035781.1\tRepeatMasker\tsimilarity\t54150483\t54151741\t23.3\t+\t.\tTarget \"Motif:Helitron-N2f_CGi\" 1 1018\r\n", "NC_035782.1\t53693367\t53693369\t61\tNC_035782.1\tRepeatMasker\tsimilarity\t53693299\t53693466\t19.6\t+\t.\tTarget \"Motif:Crypton-N6B_CGi\" 566 735\r\n", "NC_035782.1\t58675269\t58675271\t50\tNC_035782.1\tRepeatMasker\tsimilarity\t58675249\t58675337\t19.1\t-\t.\tTarget \"Motif:DNA3-12_CGi\" 290 378\r\n", "NC_035782.1\t61203970\t61203972\t51\tNC_035782.1\tRepeatMasker\tsimilarity\t61203650\t61204350\t24.8\t-\t.\tTarget \"Motif:Helitron-N2d_CGi\" 1 686\r\n", "NC_035783.1\t4336100\t4336102\t63\tNC_035783.1\tRepeatMasker\tsimilarity\t4335884\t4336135\t21.8\t+\t.\tTarget \"Motif:DNA8-4_CGi\" 42 268\r\n", "NC_035783.1\t23130125\t23130127\t53\tNC_035783.1\tRepeatMasker\tsimilarity\t23130086\t23130209\t18.6\t+\t.\tTarget \"Motif:Crypton-8N1_CGi\" 516 639\r\n", "NC_035783.1\t29749414\t29749416\t57\tNC_035783.1\tRepeatMasker\tsimilarity\t29749359\t29749442\t10.7\t+\t.\tTarget \"Motif:ISL2EU-8_CGi\" 4977 5060\r\n", "NC_035783.1\t36586659\t36586661\t52\tNC_035783.1\tRepeatMasker\tsimilarity\t36586613\t36586945\t26.5\t+\t.\tTarget \"Motif:Helitron-N43_CGi\" 1 314\r\n", "NC_035783.1\t46125338\t46125340\t51\tNC_035783.1\tRepeatMasker\tsimilarity\t46125305\t46125481\t20.4\t+\t.\tTarget \"Motif:hAT-6N1_CGi\" 1 168\r\n" ] } ], "source": [ "!head 2019-05-29-Hypermethylated-DML-TECg.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypomethylated DML" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 23\n", "hypomethylated DML overlaps with TE (Cg)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hypoDML} \\\n", "-b {transposableElementsCg} \\\n", "| wc -l\n", "!echo \"hypomethylated DML overlaps with TE (Cg)\"" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hypoDML} \\\n", "-b {transposableElementsCg} \\\n", "> 2019-05-29-Hypomethylated-DML-TECg.txt" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t22177828\t22177830\t-51\tNC_035780.1\tRepeatMasker\tsimilarity\t22177766\t22177877\t22.3\t-\t.\tTarget \"Motif:DNA9-6_CGi\" 1 115\r\n", "NC_035780.1\t57337100\t57337102\t-54\tNC_035780.1\tRepeatMasker\tsimilarity\t57337042\t57337128\t18.6\t-\t.\tTarget \"Motif:DNA2-2_CGi\" 413 498\r\n", "NC_035781.1\t29178318\t29178320\t-55\tNC_035781.1\tRepeatMasker\tsimilarity\t29177333\t29178341\t24.4\t-\t.\tTarget \"Motif:Helitron-N2d_CGi\" 2 863\r\n", "NC_035781.1\t59742649\t59742651\t-65\tNC_035781.1\tRepeatMasker\tsimilarity\t59742603\t59742651\t 4.2\t+\t.\tTarget \"Motif:(ACTAACG)n\" 1 49\r\n", "NC_035782.1\t34498893\t34498895\t-55\tNC_035782.1\tRepeatMasker\tsimilarity\t34498501\t34500091\t24.8\t+\t.\tTarget \"Motif:Helitron-N40_CGi\" 1 1569\r\n", "NC_035782.1\t34498895\t34498897\t-71\tNC_035782.1\tRepeatMasker\tsimilarity\t34498501\t34500091\t24.8\t+\t.\tTarget \"Motif:Helitron-N40_CGi\" 1 1569\r\n", "NC_035783.1\t48434286\t48434288\t-53\tNC_035783.1\tRepeatMasker\tsimilarity\t48434172\t48434360\t26.1\t-\t.\tTarget \"Motif:DNA3-11_CGi\" 1856 2040\r\n", "NC_035783.1\t49079096\t49079097\t-50\tNC_035783.1\tRepeatMasker\tsimilarity\t49079034\t49079097\t23.4\t-\t.\tTarget \"Motif:DNA2-5_CGi\" 12 78\r\n", "NC_035784.1\t2338123\t2338125\t-53\tNC_035784.1\tRepeatMasker\tsimilarity\t2338061\t2338146\t22.1\t-\t.\tTarget \"Motif:DNA2-3_CGi\" 133 218\r\n", "NC_035784.1\t14247055\t14247057\t-59\tNC_035784.1\tRepeatMasker\tsimilarity\t14247029\t14247062\t20.2\t+\t.\tTarget \"Motif:(GGAC)n\" 1 34\r\n" ] } ], "source": [ "!head 2019-05-29-Hypomethylated-DML-TECg.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DMR" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 9\n", "DMR overlaps with transposable elements (Cg)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMRlist} \\\n", "-b {transposableElementsCg} \\\n", "| wc -l\n", "!echo \"DMR overlaps with transposable elements (Cg)\"" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMRlist} \\\n", "-b {transposableElementsCg} \\\n", "> 2019-06-05-DMR-TE-Cg.txt" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035781.1\t54151500\t54151600\tDMR\t54\tNC_035781.1\tRepeatMasker\tsimilarity\t54150483\t54151741\t23.3\t+\t.\tTarget \"Motif:Helitron-N2f_CGi\" 1 1018\r\n", "NC_035783.1\t30386535\t30386600\tDMR\t52\tNC_035783.1\tRepeatMasker\tsimilarity\t30386536\t30387049\t25.2\t+\t.\tTarget \"Motif:Kolobok-N4_CGi\" 1 497\r\n", "NC_035784.1\t41345100\t41345105\tDMR\t-51\tNC_035784.1\tRepeatMasker\tsimilarity\t41345048\t41345105\t 6.9\t+\t.\tTarget \"Motif:Helitron-10_CGi\" 282 358\r\n", "NC_035784.1\t41345184\t41345200\tDMR\t-51\tNC_035784.1\tRepeatMasker\tsimilarity\t41345185\t41345249\t20.3\t-\t.\tTarget \"Motif:Helitron-N42_CGi\" 1 65\r\n", "NC_035784.1\t86309411\t86309430\tDMR\t-50\tNC_035784.1\tRepeatMasker\tsimilarity\t86309412\t86309430\t 5.5\t+\t.\tTarget \"Motif:(C)n\" 1 19\r\n", "NC_035785.1\t35798179\t35798200\tDMR\t-52\tNC_035785.1\tRepeatMasker\tsimilarity\t35798180\t35798247\t31.6\t+\t.\tTarget \"Motif:GA-rich\" 1 68\r\n", "NC_035787.1\t47281023\t47281063\tDMR\t50\tNC_035787.1\tRepeatMasker\tsimilarity\t47281024\t47281063\t17.5\t-\t.\tTarget \"Motif:DNA9-6_CGi\" 758 797\r\n", "NC_035787.1\t47281063\t47281100\tDMR\t50\tNC_035787.1\tRepeatMasker\tsimilarity\t47281064\t47281118\t22.2\t+\t.\tTarget \"Motif:DNA9-6_CGi\" 744 797\r\n", "NC_035787.1\t52112659\t52112681\tDMR\t-53\tNC_035787.1\tRepeatMasker\tsimilarity\t52112660\t52112681\t 0.0\t+\t.\tTarget \"Motif:(G)n\" 1 22\r\n", "NC_035787.1\t61149800\t61149804\tDMR\t-53\tNC_035787.1\tRepeatMasker\tsimilarity\t61149705\t61149804\t26.0\t-\t.\tTarget \"Motif:DNA2-22_CGi\" 1 103\r\n" ] } ], "source": [ "!head 2019-06-05-DMR-TE-Cg.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypermethylated DMR" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 3\n", "hyperDMR overlaps with transposable elements (Cg)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hyperDMR} \\\n", "-b {transposableElementsCg} \\\n", "| wc -l\n", "!echo \"hyperDMR overlaps with transposable elements (Cg)\"" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hyperDMR} \\\n", "-b {transposableElementsCg} \\\n", "> 2019-06-05-HyperDMR-TE-Cg.txt" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035781.1\t54151500\t54151601\t54\tNC_035781.1\tRepeatMasker\tsimilarity\t54150483\t54151741\t23.3\t+\t.\tTarget \"Motif:Helitron-N2f_CGi\" 1 1018\r\n", "NC_035783.1\t30386535\t30386601\t52\tNC_035783.1\tRepeatMasker\tsimilarity\t30386536\t30387049\t25.2\t+\t.\tTarget \"Motif:Kolobok-N4_CGi\" 1 497\r\n", "NC_035787.1\t47281023\t47281063\t50\tNC_035787.1\tRepeatMasker\tsimilarity\t47281024\t47281063\t17.5\t-\t.\tTarget \"Motif:DNA9-6_CGi\" 758 797\r\n", "NC_035787.1\t47281063\t47281101\t50\tNC_035787.1\tRepeatMasker\tsimilarity\t47281064\t47281118\t22.2\t+\t.\tTarget \"Motif:DNA9-6_CGi\" 744 797\r\n" ] } ], "source": [ "!head 2019-06-05-HyperDMR-TE-Cg.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypomethylated DMR" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 6\n", "hypoDMR overlaps with transposable elements (Cg)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hypoDMR} \\\n", "-b {transposableElementsCg} \\\n", "| wc -l\n", "!echo \"hypoDMR overlaps with transposable elements (Cg)\"" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hypoDMR} \\\n", "-b {transposableElementsCg} \\\n", "> 2019-06-05-HypoDMR-TE-Cg.txt" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035784.1\t41345100\t41345105\t-51\tNC_035784.1\tRepeatMasker\tsimilarity\t41345048\t41345105\t 6.9\t+\t.\tTarget \"Motif:Helitron-10_CGi\" 282 358\r\n", "NC_035784.1\t41345184\t41345201\t-51\tNC_035784.1\tRepeatMasker\tsimilarity\t41345185\t41345249\t20.3\t-\t.\tTarget \"Motif:Helitron-N42_CGi\" 1 65\r\n", "NC_035784.1\t86309411\t86309430\t-50\tNC_035784.1\tRepeatMasker\tsimilarity\t86309412\t86309430\t 5.5\t+\t.\tTarget \"Motif:(C)n\" 1 19\r\n", "NC_035785.1\t35798179\t35798201\t-52\tNC_035785.1\tRepeatMasker\tsimilarity\t35798180\t35798247\t31.6\t+\t.\tTarget \"Motif:GA-rich\" 1 68\r\n", "NC_035787.1\t52112659\t52112681\t-53\tNC_035787.1\tRepeatMasker\tsimilarity\t52112660\t52112681\t 0.0\t+\t.\tTarget \"Motif:(G)n\" 1 22\r\n", "NC_035787.1\t61149800\t61149804\t-53\tNC_035787.1\tRepeatMasker\tsimilarity\t61149705\t61149804\t26.0\t-\t.\tTarget \"Motif:DNA2-22_CGi\" 1 103\r\n", "NC_035787.1\t61149807\t61149901\t-53\tNC_035787.1\tRepeatMasker\tsimilarity\t61149808\t61149990\t24.0\t+\t.\tTarget \"Motif:Helitron-N2_CGi\" 127 305\r\n", "NC_035788.1\t56052700\t56052733\t-73\tNC_035788.1\tRepeatMasker\tsimilarity\t56052674\t56052733\t17.2\t+\t.\tTarget \"Motif:Helitron-N2_CGi\" 48 106\r\n", "NC_035788.1\t56052700\t56052796\t-73\tNC_035788.1\tRepeatMasker\tsimilarity\t56052681\t56052796\t21.9\t+\t.\tTarget \"Motif:(TCTG)n\" 2 117\r\n" ] } ], "source": [ "!head 2019-06-05-HypoDMR-TE-Cg.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DMR Background" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 20228\n", "DMR background overlaps with transposable elements (Cg)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMRBackground} \\\n", "-b {transposableElementsCg} \\\n", "| wc -l\n", "!echo \"DMR background overlaps with transposable elements (Cg)\"" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMRBackground} \\\n", "-b {transposableElementsCg} \\\n", "> 2019-06-05-DMRBackground-TE-Cg.txt" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t1472\t1500\t*\tNC_035780.1\tRepeatMasker\tsimilarity\t1473\t1535\t 0.0\t+\t.\tTarget \"Motif:(TAACCC)n\" 1 63\r\n", "NC_035780.1\t259831\t259900\t*\tNC_035780.1\tRepeatMasker\tsimilarity\t259832\t259930\t25.8\t-\t.\tTarget \"Motif:DNA9-7_CGi\" 1 97\r\n", "NC_035780.1\t269562\t269600\t*\tNC_035780.1\tRepeatMasker\tsimilarity\t269563\t269603\t17.1\t+\t.\tTarget \"Motif:(ATG)n\" 1 42\r\n", "NC_035780.1\t269601\t269603\t*\tNC_035780.1\tRepeatMasker\tsimilarity\t269563\t269603\t17.1\t+\t.\tTarget \"Motif:(ATG)n\" 1 42\r\n", "NC_035780.1\t270801\t270894\t*\tNC_035780.1\tRepeatMasker\tsimilarity\t270702\t270894\t22.2\t+\t.\tTarget \"Motif:DNA2-5_CGi\" 1 213\r\n", "NC_035780.1\t272001\t272062\t*\tNC_035780.1\tRepeatMasker\tsimilarity\t271965\t272062\t26.5\t+\t.\tTarget \"Motif:Kolobok-2_CGi\" 2384 2485\r\n", "NC_035780.1\t283736\t283800\t*\tNC_035780.1\tRepeatMasker\tsimilarity\t283737\t283817\t 6.7\t-\t.\tTarget \"Motif:DIRS-1_CGi\" 4930 5010\r\n", "NC_035780.1\t291341\t291377\t*\tNC_035780.1\tRepeatMasker\tsimilarity\t291342\t291377\t 8.9\t+\t.\tTarget \"Motif:(CAAGCA)n\" 1 39\r\n", "NC_035780.1\t293792\t293800\t*\tNC_035780.1\tRepeatMasker\tsimilarity\t293793\t293850\t16.3\t+\t.\tTarget \"Motif:(TCATTTT)n\" 1 58\r\n", "NC_035780.1\t306901\t306913\t*\tNC_035780.1\tRepeatMasker\tsimilarity\t306816\t306913\t13.4\t+\t.\tTarget \"Motif:DNA2-14C_CGi\" 1193 1289\r\n" ] } ], "source": [ "!head 2019-06-05-DMRBackground-TE-Cg.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2j. lncRNA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DML" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 5\n", "DML overlaps with lncRNA\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMLlist} \\\n", "-b {lncRNA} \\\n", "| wc -l\n", "!echo \"DML overlaps with lncRNA\"" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMLlist} \\\n", "-b {lncRNA} \\\n", "> 2019-05-29-DML-lncRNA.txt" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t60587461\t60587463\t-74\tNC_035780.1\tGnomon\tlnc_RNA\t60582499\t60600638\t.\t-\t.\tID=rna6160;Parent=gene3630;Dbxref=GeneID:111130677,Genbank:XR_002638971.1;Name=XR_002638971.1;gbkey=ncRNA;gene=LOC111130677;model_evidence=Supporting evidence includes similarity to: 1 EST%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 19 samples with support for all annotated introns;product=uncharacterized LOC111130677;transcript_id=XR_002638971.1\r\n", "NC_035784.1\t88305440\t88305442\t52\tNC_035784.1\tGnomon\tlnc_RNA\t88303139\t88333348\t.\t-\t.\tID=rna38906;Parent=gene22409;Dbxref=GeneID:111135652,Genbank:XR_002639633.1;Name=XR_002639633.1;gbkey=ncRNA;gene=LOC111135652;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=uncharacterized LOC111135652%2C transcript variant X3;transcript_id=XR_002639633.1\r\n", "NC_035784.1\t88305440\t88305442\t52\tNC_035784.1\tGnomon\tlnc_RNA\t88303139\t88319107\t.\t-\t.\tID=rna38907;Parent=gene22409;Dbxref=GeneID:111135652,Genbank:XR_002639632.1;Name=XR_002639632.1;gbkey=ncRNA;gene=LOC111135652;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=uncharacterized LOC111135652%2C transcript variant X2;transcript_id=XR_002639632.1\r\n", "NC_035784.1\t88305440\t88305442\t52\tNC_035784.1\tGnomon\tlnc_RNA\t88303139\t88319107\t.\t-\t.\tID=rna38908;Parent=gene22409;Dbxref=GeneID:111135652,Genbank:XR_002639634.1;Name=XR_002639634.1;gbkey=ncRNA;gene=LOC111135652;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=uncharacterized LOC111135652%2C transcript variant X4;transcript_id=XR_002639634.1\r\n", "NC_035784.1\t88305440\t88305442\t52\tNC_035784.1\tGnomon\tlnc_RNA\t88303139\t88319107\t.\t-\t.\tID=rna38909;Parent=gene22409;Dbxref=GeneID:111135652,Genbank:XR_002639631.1;Name=XR_002639631.1;gbkey=ncRNA;gene=LOC111135652;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=uncharacterized LOC111135652%2C transcript variant X1;transcript_id=XR_002639631.1\r\n", "NC_035784.1\t89193226\t89193228\t60\tNC_035784.1\tGnomon\tlnc_RNA\t89190839\t89199277\t.\t-\t.\tID=rna39000;Parent=gene22468;Dbxref=GeneID:111134994,Genbank:XR_002639546.1;Name=XR_002639546.1;gbkey=ncRNA;gene=LOC111134994;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 9 samples with support for all annotated introns;product=uncharacterized LOC111134994%2C transcript variant X2;transcript_id=XR_002639546.1\r\n", "NC_035784.1\t89193226\t89193228\t60\tNC_035784.1\tGnomon\tlnc_RNA\t89190839\t89199277\t.\t-\t.\tID=rna39001;Parent=gene22468;Dbxref=GeneID:111134994,Genbank:XR_002639545.1;Name=XR_002639545.1;gbkey=ncRNA;gene=LOC111134994;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=uncharacterized LOC111134994%2C transcript variant X1;transcript_id=XR_002639545.1\r\n", "NC_035785.1\t11013150\t11013152\t-52\tNC_035785.1\tGnomon\tlnc_RNA\t10987147\t11028325\t.\t+\t.\tID=rna40771;Parent=gene23511;Dbxref=GeneID:111101686,Genbank:XR_002634366.1;Name=XR_002634366.1;gbkey=ncRNA;gene=LOC111101686;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=uncharacterized LOC111101686%2C transcript variant X1;transcript_id=XR_002634366.1\r\n", "NC_035785.1\t11013150\t11013152\t-52\tNC_035785.1\tGnomon\tlnc_RNA\t10989103\t11028325\t.\t+\t.\tID=rna40772;Parent=gene23511;Dbxref=GeneID:111101686,Genbank:XR_002634367.1;Name=XR_002634367.1;gbkey=ncRNA;gene=LOC111101686;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=uncharacterized LOC111101686%2C transcript variant X2;transcript_id=XR_002634367.1\r\n", "NC_035788.1\t67025454\t67025456\t-56\tNC_035788.1\tGnomon\tlnc_RNA\t67004372\t67026708\t.\t+\t.\tID=rna61981;Parent=gene36151;Dbxref=GeneID:111112646,Genbank:XR_002636307.1;Name=XR_002636307.1;gbkey=ncRNA;gene=LOC111112646;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111112646%2C transcript variant X2;transcript_id=XR_002636307.1\r\n" ] } ], "source": [ "!head 2019-05-29-DML-lncRNA.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2k. Intergenic regions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DML" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 38\n", "DML overlaps with intergenic regions\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMLlist} \\\n", "-b {intergenic} \\\n", "| wc -l\n", "!echo \"DML overlaps with intergenic regions\"" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMLlist} \\\n", "-b {intergenic} \\\n", "> 2019-05-29-DML-intergenic.txt" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t58135767\t58135769\t74\tNC_035780.1\t58134019\t58135922\r\n", "NC_035781.1\t20620123\t20620125\t57\tNC_035781.1\t20613318\t20621317\r\n", "NC_035781.1\t29178318\t29178320\t-55\tNC_035781.1\t29172604\t29197233\r\n", "NC_035781.1\t30062222\t30062224\t60\tNC_035781.1\t30061881\t30064434\r\n", "NC_035781.1\t31150010\t31150012\t53\tNC_035781.1\t31149350\t31150662\r\n", "NC_035781.1\t39583208\t39583210\t-50\tNC_035781.1\t39580797\t39587601\r\n", "NC_035781.1\t50711254\t50711256\t-71\tNC_035781.1\t50706175\t50720081\r\n", "NC_035781.1\t54151548\t54151550\t54\tNC_035781.1\t54147835\t54154389\r\n", "NC_035782.1\t6685343\t6685345\t-68\tNC_035782.1\t6681158\t6685982\r\n", "NC_035782.1\t6685349\t6685351\t-50\tNC_035782.1\t6681158\t6685982\r\n" ] } ], "source": [ "!head 2019-05-29-DML-intergenic.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2l. Methylation Islands" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DML" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 537\n", "DML overlaps with methylation islands\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMLlist} \\\n", "-b {methylationIslands} \\\n", "| wc -l\n", "!echo \"DML overlaps with methylation islands\"" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMLlist} \\\n", "-b {methylationIslands} \\\n", "> 2020-02-06-DML-MI.txt" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t401630\t401632\t53\tNC_035780.1\t380654\t423774\r\n", "NC_035780.1\t571138\t571140\t58\tNC_035780.1\t545053\t645842\r\n", "NC_035780.1\t1882691\t1882693\t64\tNC_035780.1\t1880812\t1901214\r\n", "NC_035780.1\t1885022\t1885024\t61\tNC_035780.1\t1880812\t1901214\r\n", "NC_035780.1\t1933499\t1933500\t51\tNC_035780.1\t1932666\t1933500\r\n", "NC_035780.1\t2538924\t2538926\t-50\tNC_035780.1\t2524316\t2546649\r\n", "NC_035780.1\t2541726\t2541728\t-54\tNC_035780.1\t2524316\t2546649\r\n", "NC_035780.1\t2584492\t2584494\t56\tNC_035780.1\t2557126\t2603667\r\n", "NC_035780.1\t2586508\t2586510\t-53\tNC_035780.1\t2557126\t2603667\r\n", "NC_035780.1\t2589720\t2589722\t57\tNC_035780.1\t2557126\t2603667\r\n" ] } ], "source": [ "!head 2020-02-06-DML-MI.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypermethylated DML" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 283\n", "hypermethylated DML overlaps with MI\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hyperDML} \\\n", "-b {methylationIslands} \\\n", "| wc -l\n", "!echo \"hypermethylated DML overlaps with MI\"" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hyperDML} \\\n", "-b {methylationIslands} \\\n", "> 2020-02-06-Hypermethylated-DML-MI.txt" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t401630\t401632\t53\tNC_035780.1\t380654\t423774\r\n", "NC_035780.1\t571138\t571140\t58\tNC_035780.1\t545053\t645842\r\n", "NC_035780.1\t1882691\t1882693\t64\tNC_035780.1\t1880812\t1901214\r\n", "NC_035780.1\t1885022\t1885024\t61\tNC_035780.1\t1880812\t1901214\r\n", "NC_035780.1\t1933499\t1933500\t51\tNC_035780.1\t1932666\t1933500\r\n", "NC_035780.1\t2584492\t2584494\t56\tNC_035780.1\t2557126\t2603667\r\n", "NC_035780.1\t2589720\t2589722\t57\tNC_035780.1\t2557126\t2603667\r\n", "NC_035780.1\t4286286\t4286288\t67\tNC_035780.1\t4283641\t4311592\r\n", "NC_035780.1\t8833124\t8833126\t60\tNC_035780.1\t8832978\t8833524\r\n", "NC_035780.1\t12631453\t12631455\t60\tNC_035780.1\t12631071\t12632347\r\n" ] } ], "source": [ "!head 2020-02-06-Hypermethylated-DML-MI.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypomethylated DML" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 254\n", "hypomethylated DML overlaps with MI\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {hypoDML} \\\n", "-b {methylationIslands} \\\n", "| wc -l\n", "!echo \"hypomethylated DML overlaps with MI\"" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {hypoDML} \\\n", "-b {methylationIslands} \\\n", "> 2020-02-06-Hypomethylated-DML-MI.txt" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t2538924\t2538926\t-50\tNC_035780.1\t2524316\t2546649\r\n", "NC_035780.1\t2541726\t2541728\t-54\tNC_035780.1\t2524316\t2546649\r\n", "NC_035780.1\t2586508\t2586510\t-53\tNC_035780.1\t2557126\t2603667\r\n", "NC_035780.1\t4286802\t4286804\t-62\tNC_035780.1\t4283641\t4311592\r\n", "NC_035780.1\t4288213\t4288215\t-58\tNC_035780.1\t4283641\t4311592\r\n", "NC_035780.1\t4289628\t4289630\t-52\tNC_035780.1\t4283641\t4311592\r\n", "NC_035780.1\t8693287\t8693289\t-52\tNC_035780.1\t8693236\t8694022\r\n", "NC_035780.1\t9110274\t9110276\t-63\tNC_035780.1\t9109570\t9110353\r\n", "NC_035780.1\t17093218\t17093220\t-52\tNC_035780.1\t17092755\t17093973\r\n", "NC_035780.1\t17488958\t17488960\t-57\tNC_035780.1\t17488237\t17489213\r\n" ] } ], "source": [ "!head 2020-02-06-Hypomethylated-DML-MI.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Identify Overlaps between Other Genome Feature Tracks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I began some of this work in [this Jupyter notebook](https://github.com/fish546-2018/yaamini-virginica/blob/master/notebooks/2019-05-13-Generating-Genome-Feature-Tracks.ipynb) for CG motif overlaps with genomic feature tracks. Now I'll continue this for other tracks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3a. Transposable Elements (all)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To fully understand my results, I also need to know where TEs are located with respect to exons, introns, and genes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exons" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 50331\n", "Exon overlaps with transposable elements (all)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {exonList} \\\n", "-b {transposableElementsAll} \\\n", "| wc -l\n", "!echo \"Exon overlaps with transposable elements (all)\"" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wo \\\n", "-a {exonList} \\\n", "-b {transposableElementsAll} \\\n", "> 2018-11-07-Exon-TE-all.txt" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t108305\t110077\tNC_035780.1\tRepeatMasker\tsimilarity\t109968\t109996\t 0.0\t+\t.\tTarget \"Motif:(CCT)n\" 1 29\t29\r\n", "NC_035780.1\t164820\t164941\tNC_035780.1\tRepeatMasker\tsimilarity\t164886\t164914\t 7.3\t+\t.\tTarget \"Motif:(GAG)n\" 1 29\t29\r\n", "NC_035780.1\t165620\t166793\tNC_035780.1\tRepeatMasker\tsimilarity\t166075\t166280\t32.8\t+\t.\tTarget \"Motif:Harbinger1_DR\" 1472 1676\t206\r\n", "NC_035780.1\t165620\t166793\tNC_035780.1\tRepeatMasker\tsimilarity\t166501\t166566\t30.3\t+\t.\tTarget \"Motif:Harbinger-6_DR\" 1152 1217\t66\r\n", "NC_035780.1\t165620\t166793\tNC_035780.1\tRepeatMasker\tsimilarity\t166598\t166642\t17.8\t+\t.\tTarget \"Motif:hATw-1_HM\" 2778 2822\t45\r\n", "NC_035780.1\t219451\t220204\tNC_035780.1\tRepeatMasker\tsimilarity\t220122\t220199\t24.7\t-\t.\tTarget \"Motif:Gypsy-75_CQ-I\" 1012 1091\t78\r\n", "NC_035780.1\t227734\t228033\tNC_035780.1\tRepeatMasker\tsimilarity\t227768\t227819\t25.0\t+\t.\tTarget \"Motif:A-rich\" 1 54\t52\r\n", "NC_035780.1\t227734\t228033\tNC_035780.1\tRepeatMasker\tsimilarity\t227768\t227819\t25.0\t+\t.\tTarget \"Motif:A-rich\" 1 54\t52\r\n", "NC_035780.1\t227734\t228033\tNC_035780.1\tRepeatMasker\tsimilarity\t227768\t227819\t25.0\t+\t.\tTarget \"Motif:A-rich\" 1 54\t52\r\n", "NC_035780.1\t228331\t228520\tNC_035780.1\tRepeatMasker\tsimilarity\t228342\t228392\t20.0\t+\t.\tTarget \"Motif:RTE-3_Hmel\" 1405 1455\t51\r\n" ] } ], "source": [ "!head 2018-11-07-Exon-TE-all.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Introns" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 115151\n", "Intron overlaps with transposable elements (all)\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {intronList} \\\n", "-b {transposableElementsAll} \\\n", "| wc -l\n", "!echo \"Intron overlaps with transposable elements (all)\"" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wo \\\n", "-a {intronList} \\\n", "-b {transposableElementsAll} \\\n", "> 2018-11-07-Intron-TE-all.txt" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t32565\t32958\tNC_035780.1\tRepeatMasker\tsimilarity\t32720\t32819\t18.2\t+\t.\tTarget \"Motif:Crypton-9N1_CGi\" 239 337\t100\r\n", "NC_035780.1\t46506\t64122\tNC_035780.1\tRepeatMasker\tsimilarity\t48463\t48520\t 8.8\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 280 337\t58\r\n", "NC_035780.1\t46506\t64122\tNC_035780.1\tRepeatMasker\tsimilarity\t48666\t49000\t10.9\t-\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\t335\r\n", "NC_035780.1\t46506\t64122\tNC_035780.1\tRepeatMasker\tsimilarity\t50251\t50279\t 0.0\t+\t.\tTarget \"Motif:(GGTTAG)n\" 1 29\t29\r\n", "NC_035780.1\t46506\t64122\tNC_035780.1\tRepeatMasker\tsimilarity\t50606\t50760\t21.3\t+\t.\tTarget \"Motif:Harbinger-2N1_CGi\" 1 166\t155\r\n", "NC_035780.1\t46506\t64122\tNC_035780.1\tRepeatMasker\tsimilarity\t50977\t51034\t 0.0\t+\t.\tTarget \"Motif:(TA)n\" 1 58\t58\r\n", "NC_035780.1\t46506\t64122\tNC_035780.1\tRepeatMasker\tsimilarity\t51456\t51498\t 0.0\t+\t.\tTarget \"Motif:(AG)n\" 1 43\t43\r\n", "NC_035780.1\t46506\t64122\tNC_035780.1\tRepeatMasker\tsimilarity\t51721\t51922\t21.8\t+\t.\tTarget \"Motif:Harbinger-2N1_CGi\" 2568 2776\t202\r\n", "NC_035780.1\t46506\t64122\tNC_035780.1\tRepeatMasker\tsimilarity\t53156\t53294\t20.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 127 306\t139\r\n", "NC_035780.1\t85777\t88422\tNC_035780.1\tRepeatMasker\tsimilarity\t86825\t86942\t26.5\t-\t.\tTarget \"Motif:CVA\" 81 203\t118\r\n" ] } ], "source": [ "!head 2018-11-07-Intron-TE-all.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Genes" ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 33739\n", "gene overlaps with transposable elements (all)\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {geneList} \\\n", "-b {transposableElementsAll} \\\n", "| wc -l\n", "!echo \"gene overlaps with transposable elements (all)\"" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wo \\\n", "-a {geneList} \\\n", "-b {transposableElementsAll} \\\n", "> 2018-11-07-Genes-TE-all.txt" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t28961\t33324\tNC_035780.1\tRepeatMasker\tsimilarity\t32720\t32819\t18.2\t+\t.\tTarget \"Motif:Crypton-9N1_CGi\" 239 337\t100\r\n", "NC_035780.1\t43111\t66897\tNC_035780.1\tRepeatMasker\tsimilarity\t48463\t48520\t 8.8\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 280 337\t58\r\n", "NC_035780.1\t43111\t66897\tNC_035780.1\tRepeatMasker\tsimilarity\t48666\t49000\t10.9\t-\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\t335\r\n", "NC_035780.1\t43111\t66897\tNC_035780.1\tRepeatMasker\tsimilarity\t50251\t50279\t 0.0\t+\t.\tTarget \"Motif:(GGTTAG)n\" 1 29\t29\r\n", "NC_035780.1\t43111\t66897\tNC_035780.1\tRepeatMasker\tsimilarity\t50606\t50760\t21.3\t+\t.\tTarget \"Motif:Harbinger-2N1_CGi\" 1 166\t155\r\n", "NC_035780.1\t43111\t66897\tNC_035780.1\tRepeatMasker\tsimilarity\t50977\t51034\t 0.0\t+\t.\tTarget \"Motif:(TA)n\" 1 58\t58\r\n", "NC_035780.1\t43111\t66897\tNC_035780.1\tRepeatMasker\tsimilarity\t51456\t51498\t 0.0\t+\t.\tTarget \"Motif:(AG)n\" 1 43\t43\r\n", "NC_035780.1\t43111\t66897\tNC_035780.1\tRepeatMasker\tsimilarity\t51721\t51922\t21.8\t+\t.\tTarget \"Motif:Harbinger-2N1_CGi\" 2568 2776\t202\r\n", "NC_035780.1\t43111\t66897\tNC_035780.1\tRepeatMasker\tsimilarity\t53156\t53294\t20.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 127 306\t139\r\n", "NC_035780.1\t85606\t95254\tNC_035780.1\tRepeatMasker\tsimilarity\t86825\t86942\t26.5\t-\t.\tTarget \"Motif:CVA\" 81 203\t118\r\n" ] } ], "source": [ "!head 2018-11-07-Genes-TE-all.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### CG motifs" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 2828372\n", "CG motif overlaps with transposable elements (all)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {CGMotifList} \\\n", "-b {transposableElementsAll} \\\n", "| wc -l\n", "!echo \"CG motif overlaps with transposable elements (all)\"" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wo \\\n", "-a {CGMotifList} \\\n", "-b {transposableElementsAll} \\\n", "> 2018-11-07-TE-all-CGmotif.txt" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t5078\t5080\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\t1\r\n", "NC_035780.1\t5159\t5161\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\t2\r\n", "NC_035780.1\t5162\t5164\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\t2\r\n", "NC_035780.1\t5174\t5176\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\t2\r\n", "NC_035780.1\t5191\t5193\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\t2\r\n", "NC_035780.1\t5220\t5222\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\t2\r\n", "NC_035780.1\t5317\t5319\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\t2\r\n", "NC_035780.1\t5357\t5359\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\t2\r\n", "NC_035780.1\t5381\t5383\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\t2\r\n", "NC_035780.1\t5398\t5400\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\t2\r\n" ] } ], "source": [ "!head 2018-11-07-TE-all-CGmotif.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3b. Transposable Elements (_C. gigas_ only)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exons" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 41511\n", "Exon overlaps with transposable elements (Cg)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {exonList} \\\n", "-b {transposableElementsCg} \\\n", "| wc -l\n", "!echo \"Exon overlaps with transposable elements (Cg)\"" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {exonList} \\\n", "-b {transposableElementsCg} \\\n", "> 2018-11-07-Exon-TE-Cg.txt" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t109967\t109996\tNC_035780.1\tRepeatMasker\tsimilarity\t109968\t109996\t 0.0\t+\t.\tTarget \"Motif:(CCT)n\" 1 29\r\n", "NC_035780.1\t164885\t164914\tNC_035780.1\tRepeatMasker\tsimilarity\t164886\t164914\t 7.3\t+\t.\tTarget \"Motif:(GAG)n\" 1 29\r\n", "NC_035780.1\t227767\t227819\tNC_035780.1\tRepeatMasker\tsimilarity\t227768\t227819\t25.0\t+\t.\tTarget \"Motif:A-rich\" 1 54\r\n", "NC_035780.1\t227767\t227819\tNC_035780.1\tRepeatMasker\tsimilarity\t227768\t227819\t25.0\t+\t.\tTarget \"Motif:A-rich\" 1 54\r\n", "NC_035780.1\t227767\t227819\tNC_035780.1\tRepeatMasker\tsimilarity\t227768\t227819\t25.0\t+\t.\tTarget \"Motif:A-rich\" 1 54\r\n", "NC_035780.1\t233475\t233478\tNC_035780.1\tRepeatMasker\tsimilarity\t233445\t233478\t10.1\t+\t.\tTarget \"Motif:(CCTTT)n\" 1 35\r\n", "NC_035780.1\t232863\t233028\tNC_035780.1\tRepeatMasker\tsimilarity\t232798\t233028\t29.7\t-\t.\tTarget \"Motif:ISL2EU-N8_CGi\" 15 237\r\n", "NC_035780.1\t269562\t269603\tNC_035780.1\tRepeatMasker\tsimilarity\t269563\t269603\t17.1\t+\t.\tTarget \"Motif:(ATG)n\" 1 42\r\n", "NC_035780.1\t258539\t258574\tNC_035780.1\tRepeatMasker\tsimilarity\t258540\t258574\t16.3\t+\t.\tTarget \"Motif:(ATACAAT)n\" 1 36\r\n", "NC_035780.1\t269562\t269603\tNC_035780.1\tRepeatMasker\tsimilarity\t269563\t269603\t17.1\t+\t.\tTarget \"Motif:(ATG)n\" 1 42\r\n" ] } ], "source": [ "!head 2018-11-07-Exon-TE-Cg.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Introns" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 107542\n", "Intron overlaps with transposable elements (Cg)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {intronList} \\\n", "-b {transposableElementsCg} \\\n", "| wc -l\n", "!echo \"Intron overlaps with transposable elements (Cg)\"" ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {intronList} \\\n", "-b {transposableElementsCg} \\\n", "> 2018-11-07-Intron-TE-Cg.txt" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t32719\t32819\tNC_035780.1\tRepeatMasker\tsimilarity\t32720\t32819\t18.2\t+\t.\tTarget \"Motif:Crypton-9N1_CGi\" 239 337\r\n", "NC_035780.1\t46753\t46805\tNC_035780.1\tRepeatMasker\tsimilarity\t46754\t46805\t 6.8\t+\t.\tTarget \"Motif:DNA-22_CGi\" 631 722\r\n", "NC_035780.1\t50250\t50279\tNC_035780.1\tRepeatMasker\tsimilarity\t50251\t50279\t 0.0\t+\t.\tTarget \"Motif:(GGTTAG)n\" 1 29\r\n", "NC_035780.1\t50605\t50760\tNC_035780.1\tRepeatMasker\tsimilarity\t50606\t50760\t21.3\t+\t.\tTarget \"Motif:Harbinger-2N1_CGi\" 1 166\r\n", "NC_035780.1\t50976\t51034\tNC_035780.1\tRepeatMasker\tsimilarity\t50977\t51034\t 0.0\t+\t.\tTarget \"Motif:(TA)n\" 1 58\r\n", "NC_035780.1\t51455\t51498\tNC_035780.1\tRepeatMasker\tsimilarity\t51456\t51498\t 0.0\t+\t.\tTarget \"Motif:(AG)n\" 1 43\r\n", "NC_035780.1\t51720\t51922\tNC_035780.1\tRepeatMasker\tsimilarity\t51721\t51922\t21.8\t+\t.\tTarget \"Motif:Harbinger-2N1_CGi\" 2568 2776\r\n", "NC_035780.1\t86839\t86942\tNC_035780.1\tRepeatMasker\tsimilarity\t86840\t86942\t27.4\t-\t.\tTarget \"Motif:Helitron-N14_CGi\" 83 189\r\n", "NC_035780.1\t87408\t87513\tNC_035780.1\tRepeatMasker\tsimilarity\t87409\t87513\t19.8\t-\t.\tTarget \"Motif:Helitron-7N1_CGi\" 748 850\r\n", "NC_035780.1\t87525\t87837\tNC_035780.1\tRepeatMasker\tsimilarity\t87526\t87837\t24.3\t-\t.\tTarget \"Motif:DNA3-12_CGi\" 60 378\r\n" ] } ], "source": [ "!head 2018-11-07-Intron-TE-Cg.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Genes" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 32705\n", "gene overlaps with transposable elements (Cg)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {geneList} \\\n", "-b {transposableElementsCg} \\\n", "| wc -l\n", "!echo \"gene overlaps with transposable elements (Cg)\"" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {geneList} \\\n", "-b {transposableElementsCg} \\\n", "> 2018-11-07-Gene-TE-Cg.txt" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t32719\t32819\tNC_035780.1\tRepeatMasker\tsimilarity\t32720\t32819\t18.2\t+\t.\tTarget \"Motif:Crypton-9N1_CGi\" 239 337\r\n", "NC_035780.1\t46753\t46805\tNC_035780.1\tRepeatMasker\tsimilarity\t46754\t46805\t 6.8\t+\t.\tTarget \"Motif:DNA-22_CGi\" 631 722\r\n", "NC_035780.1\t50250\t50279\tNC_035780.1\tRepeatMasker\tsimilarity\t50251\t50279\t 0.0\t+\t.\tTarget \"Motif:(GGTTAG)n\" 1 29\r\n", "NC_035780.1\t50605\t50760\tNC_035780.1\tRepeatMasker\tsimilarity\t50606\t50760\t21.3\t+\t.\tTarget \"Motif:Harbinger-2N1_CGi\" 1 166\r\n", "NC_035780.1\t50976\t51034\tNC_035780.1\tRepeatMasker\tsimilarity\t50977\t51034\t 0.0\t+\t.\tTarget \"Motif:(TA)n\" 1 58\r\n", "NC_035780.1\t51455\t51498\tNC_035780.1\tRepeatMasker\tsimilarity\t51456\t51498\t 0.0\t+\t.\tTarget \"Motif:(AG)n\" 1 43\r\n", "NC_035780.1\t51720\t51922\tNC_035780.1\tRepeatMasker\tsimilarity\t51721\t51922\t21.8\t+\t.\tTarget \"Motif:Harbinger-2N1_CGi\" 2568 2776\r\n", "NC_035780.1\t86839\t86942\tNC_035780.1\tRepeatMasker\tsimilarity\t86840\t86942\t27.4\t-\t.\tTarget \"Motif:Helitron-N14_CGi\" 83 189\r\n", "NC_035780.1\t87408\t87513\tNC_035780.1\tRepeatMasker\tsimilarity\t87409\t87513\t19.8\t-\t.\tTarget \"Motif:Helitron-7N1_CGi\" 748 850\r\n", "NC_035780.1\t87525\t87837\tNC_035780.1\tRepeatMasker\tsimilarity\t87526\t87837\t24.3\t-\t.\tTarget \"Motif:DNA3-12_CGi\" 60 378\r\n" ] } ], "source": [ "!head 2018-11-07-Gene-TE-Cg.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### CG motifs" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 2142774\n", "CG motif overlaps with transposable elements (Cg)\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {CGMotifList} \\\n", "-b {transposableElementsCg} \\\n", "| wc -l\n", "!echo \"CG motif overlaps with transposable elements (Cg)\"" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {CGMotifList} \\\n", "-b {transposableElementsCg} \\\n", "> 2018-11-07-TE-Cg-CGmotif.txt" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t5079\t5080\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n", "NC_035780.1\t5159\t5161\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n", "NC_035780.1\t5162\t5164\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n", "NC_035780.1\t5174\t5176\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n", "NC_035780.1\t5191\t5193\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n", "NC_035780.1\t5220\t5222\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n", "NC_035780.1\t5317\t5319\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n", "NC_035780.1\t5357\t5359\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n", "NC_035780.1\t5381\t5383\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n", "NC_035780.1\t5398\t5400\tCG_motif\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n" ] } ], "source": [ "!head 2018-11-07-TE-Cg-CGmotif.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3c. Exons" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To help with downstream annotations, I also want to look at exon overlaps with genes." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 731279\n", "exon overlaps with genes\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {exonList} \\\n", "-b {geneList} \\\n", "| wc -l\n", "!echo \"exon overlaps with genes\"" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {exonList} \\\n", "-b {geneList} \\\n", "> 2019-06-20-Exon-Gene.txt" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t13578\t13603\tNC_035780.1\t13578\t14594\r\n", "NC_035780.1\t14237\t14290\tNC_035780.1\t13578\t14594\r\n", "NC_035780.1\t14557\t14594\tNC_035780.1\t13578\t14594\r\n", "NC_035780.1\t28961\t29073\tNC_035780.1\t28961\t33324\r\n", "NC_035780.1\t30524\t31557\tNC_035780.1\t28961\t33324\r\n", "NC_035780.1\t31736\t31887\tNC_035780.1\t28961\t33324\r\n", "NC_035780.1\t31977\t32565\tNC_035780.1\t28961\t33324\r\n", "NC_035780.1\t32959\t33324\tNC_035780.1\t28961\t33324\r\n", "NC_035780.1\t43111\t44358\tNC_035780.1\t43111\t66897\r\n", "NC_035780.1\t43111\t44358\tNC_035780.1\t43111\t66897\r\n" ] } ], "source": [ "!head 2019-06-20-Exon-Gene.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3d. Introns" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 316614\n", "intron overlaps with genes\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {intronList} \\\n", "-b {geneList} \\\n", "| wc -l\n", "!echo \"intron overlaps with genes\"" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {intronList} \\\n", "-b {geneList} \\\n", "> 2019-06-20-Intron-Gene.txt" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t13603\t14236\tNC_035780.1\t13578\t14594\r\n", "NC_035780.1\t14290\t14556\tNC_035780.1\t13578\t14594\r\n", "NC_035780.1\t29073\t30523\tNC_035780.1\t28961\t33324\r\n", "NC_035780.1\t31557\t31735\tNC_035780.1\t28961\t33324\r\n", "NC_035780.1\t31887\t31976\tNC_035780.1\t28961\t33324\r\n", "NC_035780.1\t32565\t32958\tNC_035780.1\t28961\t33324\r\n", "NC_035780.1\t44358\t45912\tNC_035780.1\t43111\t66897\r\n", "NC_035780.1\t46506\t64122\tNC_035780.1\t43111\t66897\r\n", "NC_035780.1\t64334\t66868\tNC_035780.1\t43111\t66897\r\n", "NC_035780.1\t85777\t88422\tNC_035780.1\t85606\t95254\r\n" ] } ], "source": [ "!head 2019-06-20-Intron-Gene.txt" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## 4. Gene Flanking" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will perform a flanking analysis in two ways. First, I will use `bedtools flank` to add 1000 bp regions to each mRNA coding region. I can then isolate these flanks and intersect them with various genomic feature files. Second I will use `bedtools closest` to find the closest non-overlapping DML or DMR to each mRNA coding region." ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "collapsed": true }, "outputs": [], "source": [ "mkdir 2019-05-29-Flanking-Analysis #Create a new directory for flanking analysis output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4a. `flank`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I also need to know if DMLs and CG motifs overlap with regions that flank mRNA. These flanking regions could be promoters or transcription factors that could regulate these processes. To do this, I will use `bedtools flank`:\n", "\n", "1. Path to `flankBed`\n", "2. -i: Path to mRNA GFF file\n", "3. -g: Path to C. virginica \"genome\" file. flankBed requires the start and stop position of each genome (see this issue). I created a file like in TextWrangler using chromosome lengths from NCBI.\n", "4. -b 1000: Add 1000 bp flanks to each end of the coding region" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}flankBed \\\n", "-i {mRNAList} \\\n", "-g 2018-11-14-Flanking-Analysis/2018-11-14-bedtools-Chromosome-Length.txt \\\n", "-b 1000 \\\n", "> 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-Flanks.bed" ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t28961\t33324\r\n", "NC_035780.1\t43111\t66897\r\n", "NC_035780.1\t43111\t46506\r\n", "NC_035780.1\t85606\t95254\r\n", "NC_035780.1\t99840\t106460\r\n", "NC_035780.1\t108305\t110077\r\n", "NC_035780.1\t151859\t157536\r\n", "NC_035780.1\t163809\t183798\r\n", "NC_035780.1\t164820\t166793\r\n", "NC_035780.1\t190449\t193594\r\n" ] } ], "source": [ "!head {mRNAList} #The original file, just for comparison" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n", "NC_035780.1\tGnomon\tmRNA\t33325\t34324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n", "NC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n", "NC_035780.1\tGnomon\tmRNA\t66898\t67897\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n", "NC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n", "NC_035780.1\tGnomon\tmRNA\t46507\t47506\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n", "NC_035780.1\tGnomon\tmRNA\t84606\t85605\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n", "NC_035780.1\tGnomon\tmRNA\t95255\t96254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n", "NC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n", "NC_035780.1\tGnomon\tmRNA\t106461\t107460\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n" ] } ], "source": [ "!head 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-Flanks.bed #Isolated flanks. The first entry is the upstream flank for the first mRNA coding region, second is the downstream flank for the mRNA coding region, etc." ] }, { "cell_type": "code", "execution_count": 114, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 120402 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-Flanks.bed\r\n" ] } ], "source": [ "!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-Flanks.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that I have these flanks, I want to separate the upstream flank from the downstream flank. I will do this using `awk`. If th row number is odd, the rows go into the upstream flank file. If the row number is even, it goes into the downstream flank file." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [], "source": [ "!awk '{ if (NR%2) print > \"2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed\"; \\\n", "else print > \"2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed\" }' \\\n", "2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-Flanks.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Upstream flanks" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n", "NC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n", "NC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n", "NC_035780.1\tGnomon\tmRNA\t84606\t85605\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n", "NC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n", "NC_035780.1\tGnomon\tmRNA\t107305\t108304\t.\t-\t.\tID=rna6;Parent=gene5;Dbxref=GeneID:111128944,Genbank:XM_022474921.1;Name=XM_022474921.1;gbkey=mRNA;gene=LOC111128944;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 93%25 coverage of the annotated genomic feature by RNAseq alignments;partial=true;product=mucin-19-like;start_range=.,108305;transcript_id=XM_022474921.1\r\n", "NC_035780.1\tGnomon\tmRNA\t150859\t151858\t.\t+\t.\tID=rna7;Parent=gene6;Dbxref=GeneID:111128953,Genbank:XM_022474931.1;Name=XM_022474931.1;gbkey=mRNA;gene=LOC111128953;model_evidence=Supporting evidence includes similarity to: 1 Protein;product=GATA zinc finger domain-containing protein 14-like;transcript_id=XM_022474931.1\r\n", "NC_035780.1\tGnomon\tmRNA\t162809\t163808\t.\t-\t.\tID=rna8;Parent=gene7;Dbxref=GeneID:111105691,Genbank:XM_022440054.1;Name=XM_022440054.1;gbkey=mRNA;gene=LOC111105691;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 9 samples with support for all annotated introns;product=uncharacterized LOC111105691;transcript_id=XM_022440054.1\r\n", "NC_035780.1\tGnomon\tmRNA\t163820\t164819\t.\t+\t.\tID=rna9;Parent=gene8;Dbxref=GeneID:111105685,Genbank:XM_022440042.1;Name=XM_022440042.1;gbkey=mRNA;gene=LOC111105685;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=protein ANTAGONIST OF LIKE HETEROCHROMATIN PROTEIN 1-like;transcript_id=XM_022440042.1\r\n", "NC_035780.1\tGnomon\tmRNA\t189449\t190448\t.\t-\t.\tID=rna11;Parent=gene10;Dbxref=GeneID:111133554,Genbank:XM_022482070.1;Name=XM_022482070.1;gbkey=mRNA;gene=LOC111133554;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=putative uncharacterized protein DDB_G0277407;transcript_id=XM_022482070.1\r\n" ] } ], "source": [ "!head 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 60200 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed\r\n" ] } ], "source": [ "!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Downstream flanks" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\tGnomon\tmRNA\t33325\t34324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n", "NC_035780.1\tGnomon\tmRNA\t66898\t67897\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n", "NC_035780.1\tGnomon\tmRNA\t46507\t47506\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n", "NC_035780.1\tGnomon\tmRNA\t95255\t96254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n", "NC_035780.1\tGnomon\tmRNA\t106461\t107460\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n", "NC_035780.1\tGnomon\tmRNA\t110078\t111077\t.\t-\t.\tID=rna6;Parent=gene5;Dbxref=GeneID:111128944,Genbank:XM_022474921.1;Name=XM_022474921.1;gbkey=mRNA;gene=LOC111128944;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 93%25 coverage of the annotated genomic feature by RNAseq alignments;partial=true;product=mucin-19-like;start_range=.,108305;transcript_id=XM_022474921.1\r\n", "NC_035780.1\tGnomon\tmRNA\t157537\t158536\t.\t+\t.\tID=rna7;Parent=gene6;Dbxref=GeneID:111128953,Genbank:XM_022474931.1;Name=XM_022474931.1;gbkey=mRNA;gene=LOC111128953;model_evidence=Supporting evidence includes similarity to: 1 Protein;product=GATA zinc finger domain-containing protein 14-like;transcript_id=XM_022474931.1\r\n", "NC_035780.1\tGnomon\tmRNA\t183799\t184798\t.\t-\t.\tID=rna8;Parent=gene7;Dbxref=GeneID:111105691,Genbank:XM_022440054.1;Name=XM_022440054.1;gbkey=mRNA;gene=LOC111105691;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 9 samples with support for all annotated introns;product=uncharacterized LOC111105691;transcript_id=XM_022440054.1\r\n", "NC_035780.1\tGnomon\tmRNA\t166794\t167793\t.\t+\t.\tID=rna9;Parent=gene8;Dbxref=GeneID:111105685,Genbank:XM_022440042.1;Name=XM_022440042.1;gbkey=mRNA;gene=LOC111105685;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=protein ANTAGONIST OF LIKE HETEROCHROMATIN PROTEIN 1-like;transcript_id=XM_022440042.1\r\n", "NC_035780.1\tGnomon\tmRNA\t193595\t194594\t.\t-\t.\tID=rna11;Parent=gene10;Dbxref=GeneID:111133554,Genbank:XM_022482070.1;Name=XM_022482070.1;gbkey=mRNA;gene=LOC111133554;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=putative uncharacterized protein DDB_G0277407;transcript_id=XM_022482070.1\r\n" ] } ], "source": [ "!head 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 60200 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed\r\n" ] } ], "source": [ "!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some transcripts are on the forward strand (+) while others are on the reverse strand (-). The promoters need to reflect this. The upstream flanks should only include forward strand transcripts, and the downstream flanks should only include reverse strands. I'll `grep` to ensure the correct strands are included with each flank type." ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false }, "outputs": [], "source": [ "!grep \".\t+\t.\" 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \\\n", "> 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks-Forward-Strands.bed" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035789.1\tGnomon\tmRNA\t31986775\t31987774\t.\t+\t.\tID=rna67151;Parent=gene39462;Dbxref=GeneID:111116210,Genbank:XM_022455194.1;Name=XM_022455194.1;gbkey=mRNA;gene=LOC111116210;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=lectin BRA-3-like;transcript_id=XM_022455194.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32263001\t32264000\t.\t+\t.\tID=rna67162;Parent=gene39469;Dbxref=GeneID:111117115,Genbank:XM_022456201.1;Name=XM_022456201.1;gbkey=mRNA;gene=LOC111117115;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=uncharacterized LOC111117115;transcript_id=XM_022456201.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32358936\t32359935\t.\t+\t.\tID=rna67169;Parent=gene39476;Dbxref=GeneID:111116603,Genbank:XM_022455604.1;Name=XM_022455604.1;gbkey=mRNA;gene=LOC111116603;model_evidence=Supporting evidence includes similarity to: 87%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=uncharacterized LOC111116603;transcript_id=XM_022455604.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32380138\t32381137\t.\t+\t.\tID=rna67171;Parent=gene39478;Dbxref=GeneID:111116908,Genbank:XM_022455963.1;Name=XM_022455963.1;gbkey=mRNA;gene=LOC111116908;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 8 samples with support for all annotated introns;product=CD209 antigen-like protein D%2C transcript variant X2;transcript_id=XM_022455963.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32382102\t32383101\t.\t+\t.\tID=rna67172;Parent=gene39478;Dbxref=GeneID:111116908,Genbank:XM_022455964.1;Name=XM_022455964.1;gbkey=mRNA;gene=LOC111116908;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=CD209 antigen-like protein D%2C transcript variant X3;transcript_id=XM_022455964.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32383129\t32384128\t.\t+\t.\tID=rna67173;Parent=gene39478;Dbxref=GeneID:111116908,Genbank:XM_022455961.1;Name=XM_022455961.1;gbkey=mRNA;gene=LOC111116908;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 8 samples with support for all annotated introns;product=CD209 antigen-like protein D%2C transcript variant X1;transcript_id=XM_022455961.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32426213\t32427212\t.\t+\t.\tID=rna67175;Parent=gene39480;Dbxref=GeneID:111117854,Genbank:XM_022457081.1;Name=XM_022457081.1;gbkey=mRNA;gene=LOC111117854;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 2 samples with support for all annotated introns;product=uncharacterized LOC111117854;transcript_id=XM_022457081.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32434313\t32435312\t.\t+\t.\tID=rna67176;Parent=gene39481;Dbxref=GeneID:111116604,Genbank:XM_022455605.1;Name=XM_022455605.1;gbkey=mRNA;gene=LOC111116604;model_evidence=Supporting evidence includes similarity to: 73%25 coverage of the annotated genomic feature by RNAseq alignments;product=uncharacterized LOC111116604;transcript_id=XM_022455605.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32441380\t32442379\t.\t+\t.\tID=rna67178;Parent=gene39483;Dbxref=GeneID:111116951,Genbank:XM_022456008.1;Name=XM_022456008.1;gbkey=mRNA;gene=LOC111116951;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 25%25 coverage of the annotated genomic feature by RNAseq alignments;product=uncharacterized LOC111116951;transcript_id=XM_022456008.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32640298\t32641297\t.\t+\t.\tID=rna67188;Parent=gene39492;Dbxref=GeneID:111116608,Genbank:XM_022455609.1;Name=XM_022455609.1;gbkey=mRNA;gene=LOC111116608;model_evidence=Supporting evidence includes similarity to: 5 Proteins;product=mucin-2-like;transcript_id=XM_022455609.1\r\n" ] } ], "source": [ "!tail 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks-Forward-Strands.bed" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 30218 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks-Forward-Strands.bed\r\n" ] } ], "source": [ "!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks-Forward-Strands.bed" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false }, "outputs": [], "source": [ "!grep \".\t-\t.\" 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed \\\n", "> 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks-Reverse-Strands.bed" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035789.1\tGnomon\tmRNA\t32359462\t32360461\t.\t-\t.\tID=rna67168;Parent=gene39475;Dbxref=GeneID:111117116,Genbank:XM_022456202.1;Name=XM_022456202.1;gbkey=mRNA;gene=LOC111117116;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=receptor-type tyrosine-protein phosphatase kappa-like;transcript_id=XM_022456202.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32363586\t32364585\t.\t-\t.\tID=rna67170;Parent=gene39477;Dbxref=GeneID:111117123,Genbank:XM_022456210.1;Name=XM_022456210.1;gbkey=mRNA;gene=LOC111117123;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 64%25 coverage of the annotated genomic feature by RNAseq alignments;product=putative nuclease HARBI1;transcript_id=XM_022456210.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32447885\t32448884\t.\t-\t.\tID=rna67179;Parent=gene39484;Dbxref=GeneID:111116605,Genbank:XM_022455606.1;Name=XM_022455606.1;gbkey=mRNA;gene=LOC111116605;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 70%25 coverage of the annotated genomic feature by RNAseq alignments;product=uncharacterized LOC111116605;transcript_id=XM_022455606.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32455838\t32456837\t.\t-\t.\tID=rna67180;Parent=gene39485;Dbxref=GeneID:111116846,Genbank:XM_022455871.1;Name=XM_022455871.1;gbkey=mRNA;gene=LOC111116846;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 74%25 coverage of the annotated genomic feature by RNAseq alignments;product=uncharacterized LOC111116846;transcript_id=XM_022455871.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32459753\t32460752\t.\t-\t.\tID=rna67181;Parent=gene39486;Dbxref=GeneID:111116606,Genbank:XM_022455608.1;Name=XM_022455608.1;gbkey=mRNA;gene=LOC111116606;model_evidence=Supporting evidence includes similarity to: 81%25 coverage of the annotated genomic feature by RNAseq alignments;product=uncharacterized LOC111116606;transcript_id=XM_022455608.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32477025\t32478024\t.\t-\t.\tID=rna67182;Parent=gene39487;Dbxref=GeneID:111117860,Genbank:XM_022457087.1;Name=XM_022457087.1;gbkey=mRNA;gene=LOC111117860;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 97%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=IgGFc-binding protein-like;transcript_id=XM_022457087.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32507692\t32508691\t.\t-\t.\tID=rna67183;Parent=gene39488;Dbxref=GeneID:111116849,Genbank:XM_022455873.1;Name=XM_022455873.1;gbkey=mRNA;gene=LOC111116849;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=killer cell lectin-like receptor subfamily B member 1B allele B;transcript_id=XM_022455873.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32528900\t32529899\t.\t-\t.\tID=rna67184;Parent=gene39489;Dbxref=GeneID:111117715,Genbank:XM_022456886.1;Name=XM_022456886.1;gbkey=mRNA;gene=LOC111117715;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=CD209 antigen-like protein A;transcript_id=XM_022456886.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32627096\t32628095\t.\t-\t.\tID=rna67185;Parent=gene39490;Dbxref=GeneID:111117691,Genbank:XM_022456856.1;Name=XM_022456856.1;gbkey=mRNA;gene=LOC111117691;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=IgGFc-binding protein-like%2C transcript variant X2;transcript_id=XM_022456856.1\r\n", "NC_035789.1\tGnomon\tmRNA\t32627035\t32628034\t.\t-\t.\tID=rna67186;Parent=gene39490;Dbxref=GeneID:111117691,Genbank:XM_022456855.1;Name=XM_022456855.1;gbkey=mRNA;gene=LOC111117691;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 2 samples with support for all annotated introns;product=IgGFc-binding protein-like%2C transcript variant X1;transcript_id=XM_022456855.1\r\n" ] } ], "source": [ "!tail 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks-Reverse-Strands.bed" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 29982 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks-Reverse-Strands.bed\r\n" ] } ], "source": [ "!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks-Reverse-Strands.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, I'll combine the file with upstream flanks + forward strands and downstream flanks + reverse strands to create an official promoter track." ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!cat \\\n", "2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks-Forward-Strands.bed \\\n", "2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks-Reverse-Strands.bed \\\n", "> 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Promoter-Track.bed" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": true }, "outputs": [], "source": [ "promoterTrack = \"2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Promoter-Track.bed\"" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 60200 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Promoter-Track.bed\r\n" ] } ], "source": [ "!wc -l {promoterTrack}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now I'll take the promoter track I made and use it in intersectBed to find overlaps with DML and CG motifs!\n", "\n", "1. Path to intersectBed\n", "2. -wo: Write output according to both files\n", "3. -a: Path to promoter track\n", "4. -b: Specify either DML, DMR, or CG motif file. Overlaps between the flanks and CG motifs can be used as a background when comparing DML-promoter results\n", "5. \">\" filename: Redirect output to a .txt file" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "#### DML" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wo \\\n", "-a {promoterTrack} \\\n", "-b {DMLlist} \\\n", "> 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-DML.txt" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\tGnomon\tmRNA\t8832968\t8833967\t.\t+\t.\tID=rna875;Parent=gene522;Dbxref=GeneID:111138488,Genbank:XM_022490485.1;Name=XM_022490485.1;gbkey=mRNA;gene=LOC111138488;model_evidence=Supporting evidence includes similarity to: 5 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 26 samples with support for all annotated introns;product=hsp70-binding protein 1-like;transcript_id=XM_022490485.1\tNC_035780.1\t8833124\t8833126\t60\t1\r\n", "NC_035781.1\tGnomon\tmRNA\t7626060\t7627059\t.\t+\t.\tID=rna7537;Parent=gene4444;Dbxref=GeneID:111120066,Genbank:XM_022460716.1;Name=XM_022460716.1;gbkey=mRNA;gene=LOC111120066;model_evidence=Supporting evidence includes similarity to: 4 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 15 samples with support for all annotated introns;product=short-chain dehydrogenase/reductase family 42E member 1-like%2C transcript variant X2;transcript_id=XM_022460716.1\tNC_035781.1\t7626510\t7626512\t-56\t1\r\n", "NC_035781.1\tGnomon\tmRNA\t7626070\t7627069\t.\t+\t.\tID=rna7538;Parent=gene4444;Dbxref=GeneID:111120066,Genbank:XM_022460717.1;Name=XM_022460717.1;gbkey=mRNA;gene=LOC111120066;model_evidence=Supporting evidence includes similarity to: 4 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 16 samples with support for all annotated introns;product=short-chain dehydrogenase/reductase family 42E member 1-like%2C transcript variant X3;transcript_id=XM_022460717.1\tNC_035781.1\t7626510\t7626512\t-56\t1\r\n", "NC_035782.1\tGnomon\tmRNA\t4729317\t4730316\t.\t+\t.\tID=rna13975;Parent=gene8312;Dbxref=GeneID:111123849,Genbank:XM_022466485.1;Name=XM_022466485.1;gbkey=mRNA;gene=LOC111123849;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 2 samples with support for all annotated introns;product=tripartite motif-containing protein 2-like%2C transcript variant X9;transcript_id=XM_022466485.1\tNC_035782.1\t4729348\t4729350\t55\t1\r\n", "NC_035782.1\tGnomon\tmRNA\t4729322\t4730321\t.\t+\t.\tID=rna13976;Parent=gene8312;Dbxref=GeneID:111123849,Genbank:XM_022466484.1;Name=XM_022466484.1;gbkey=mRNA;gene=LOC111123849;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=tripartite motif-containing protein 2-like%2C transcript variant X8;transcript_id=XM_022466484.1\tNC_035782.1\t4729348\t4729350\t55\t1\r\n", "NC_035783.1\tGnomon\tmRNA\t50075390\t50076389\t.\t+\t.\tID=rna27643;Parent=gene16023;Dbxref=GeneID:111130373,Genbank:XM_022477391.1;Name=XM_022477391.1;Note=The sequence of the model RefSeq transcript was modified relative to this genomic sequence to represent the inferred CDS: inserted 5 bases in 5 codons;exception=unclassified transcription discrepancy;gbkey=mRNA;gene=LOC111130373;model_evidence=Supporting evidence includes similarity to: 4 Proteins%2C and 92%25 coverage of the annotated genomic feature by RNAseq alignments;product=rap guanine nucleotide exchange factor 4-like;transcript_id=XM_022477391.1\tNC_035783.1\t50075652\t50075654\t-52\t1\r\n", "NC_035783.1\tGnomon\tmRNA\t57930863\t57931862\t.\t+\t.\tID=rna28491;Parent=gene16557;Dbxref=GeneID:111129795,Genbank:XM_022476252.1;Name=XM_022476252.1;gbkey=mRNA;gene=LOC111129795;model_evidence=Supporting evidence includes similarity to: 2 Proteins;product=S phase cyclin A-associated protein in the endoplasmic reticulum-like;transcript_id=XM_022476252.1\tNC_035783.1\t57931740\t57931742\t50\t1\r\n", "NC_035784.1\tGnomon\tmRNA\t26799941\t26800940\t.\t+\t.\tID=rna31823;Parent=gene18466;Dbxref=GeneID:111133595,Genbank:XM_022482103.1;Name=XM_022482103.1;gbkey=mRNA;gene=LOC111133595;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 9 samples with support for all annotated introns;product=inactive peptidyl-prolyl cis-trans isomerase FKBP6-like;transcript_id=XM_022482103.1\tNC_035784.1\t26800339\t26800341\t66\t1\r\n", "NC_035786.1\tGnomon\tmRNA\t33053514\t33054513\t.\t+\t.\tID=rna47206;Parent=gene27222;Dbxref=GeneID:111105024,Genbank:XM_022439201.1;Name=XM_022439201.1;gbkey=mRNA;gene=LOC111105024;model_evidence=Supporting evidence includes similarity to: 3 Proteins;product=serine/threonine-protein phosphatase 6 regulatory ankyrin repeat subunit B-like;transcript_id=XM_022439201.1\tNC_035786.1\t33053638\t33053640\t50\t1\r\n", "NC_035787.1\tGnomon\tmRNA\t72590263\t72591262\t.\t+\t.\tID=rna56618;Parent=gene32848;Dbxref=GeneID:111105716,Genbank:XM_022440074.1;Name=XM_022440074.1;gbkey=mRNA;gene=LOC111105716;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=protein polybromo-1-like%2C transcript variant X1;transcript_id=XM_022440074.1\tNC_035787.1\t72590356\t72590358\t63\t1\r\n" ] } ], "source": [ "!head 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-DML.txt" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 42 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-DML.txt\r\n" ] } ], "source": [ "!wc -l 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-DML.txt" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "#### Hypermethylated DML" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wo \\\n", "-a {promoterTrack} \\\n", "-b {hyperDML} \\\n", "> 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-Hypermethylated-DML.txt" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\tGnomon\tmRNA\t8832968\t8833967\t.\t+\t.\tID=rna875;Parent=gene522;Dbxref=GeneID:111138488,Genbank:XM_022490485.1;Name=XM_022490485.1;gbkey=mRNA;gene=LOC111138488;model_evidence=Supporting evidence includes similarity to: 5 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 26 samples with support for all annotated introns;product=hsp70-binding protein 1-like;transcript_id=XM_022490485.1\tNC_035780.1\t8833124\t8833126\t60\t1\r\n", "NC_035782.1\tGnomon\tmRNA\t4729317\t4730316\t.\t+\t.\tID=rna13975;Parent=gene8312;Dbxref=GeneID:111123849,Genbank:XM_022466485.1;Name=XM_022466485.1;gbkey=mRNA;gene=LOC111123849;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 2 samples with support for all annotated introns;product=tripartite motif-containing protein 2-like%2C transcript variant X9;transcript_id=XM_022466485.1\tNC_035782.1\t4729348\t4729350\t55\t1\r\n", "NC_035782.1\tGnomon\tmRNA\t4729322\t4730321\t.\t+\t.\tID=rna13976;Parent=gene8312;Dbxref=GeneID:111123849,Genbank:XM_022466484.1;Name=XM_022466484.1;gbkey=mRNA;gene=LOC111123849;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=tripartite motif-containing protein 2-like%2C transcript variant X8;transcript_id=XM_022466484.1\tNC_035782.1\t4729348\t4729350\t55\t1\r\n", "NC_035783.1\tGnomon\tmRNA\t57930863\t57931862\t.\t+\t.\tID=rna28491;Parent=gene16557;Dbxref=GeneID:111129795,Genbank:XM_022476252.1;Name=XM_022476252.1;gbkey=mRNA;gene=LOC111129795;model_evidence=Supporting evidence includes similarity to: 2 Proteins;product=S phase cyclin A-associated protein in the endoplasmic reticulum-like;transcript_id=XM_022476252.1\tNC_035783.1\t57931740\t57931742\t50\t1\r\n", "NC_035784.1\tGnomon\tmRNA\t26799941\t26800940\t.\t+\t.\tID=rna31823;Parent=gene18466;Dbxref=GeneID:111133595,Genbank:XM_022482103.1;Name=XM_022482103.1;gbkey=mRNA;gene=LOC111133595;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 9 samples with support for all annotated introns;product=inactive peptidyl-prolyl cis-trans isomerase FKBP6-like;transcript_id=XM_022482103.1\tNC_035784.1\t26800339\t26800341\t66\t1\r\n", "NC_035786.1\tGnomon\tmRNA\t33053514\t33054513\t.\t+\t.\tID=rna47206;Parent=gene27222;Dbxref=GeneID:111105024,Genbank:XM_022439201.1;Name=XM_022439201.1;gbkey=mRNA;gene=LOC111105024;model_evidence=Supporting evidence includes similarity to: 3 Proteins;product=serine/threonine-protein phosphatase 6 regulatory ankyrin repeat subunit B-like;transcript_id=XM_022439201.1\tNC_035786.1\t33053638\t33053640\t50\t1\r\n", "NC_035787.1\tGnomon\tmRNA\t72590263\t72591262\t.\t+\t.\tID=rna56618;Parent=gene32848;Dbxref=GeneID:111105716,Genbank:XM_022440074.1;Name=XM_022440074.1;gbkey=mRNA;gene=LOC111105716;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=protein polybromo-1-like%2C transcript variant X1;transcript_id=XM_022440074.1\tNC_035787.1\t72590356\t72590358\t63\t1\r\n", "NC_035787.1\tGnomon\tmRNA\t72590263\t72591262\t.\t+\t.\tID=rna56619;Parent=gene32848;Dbxref=GeneID:111105716,Genbank:XM_022440092.1;Name=XM_022440092.1;gbkey=mRNA;gene=LOC111105716;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 5 samples with support for all annotated introns;product=protein polybromo-1-like%2C transcript variant X18;transcript_id=XM_022440092.1\tNC_035787.1\t72590356\t72590358\t63\t1\r\n", "NC_035787.1\tGnomon\tmRNA\t72590263\t72591262\t.\t+\t.\tID=rna56620;Parent=gene32848;Dbxref=GeneID:111105716,Genbank:XM_022440094.1;Name=XM_022440094.1;gbkey=mRNA;gene=LOC111105716;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 5 samples with support for all annotated introns;product=protein polybromo-1-like%2C transcript variant X20;transcript_id=XM_022440094.1\tNC_035787.1\t72590356\t72590358\t63\t1\r\n", "NC_035787.1\tGnomon\tmRNA\t72590263\t72591262\t.\t+\t.\tID=rna56621;Parent=gene32848;Dbxref=GeneID:111105716,Genbank:XM_022440084.1;Name=XM_022440084.1;gbkey=mRNA;gene=LOC111105716;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=protein polybromo-1-like%2C transcript variant X10;transcript_id=XM_022440084.1\tNC_035787.1\t72590356\t72590358\t63\t1\r\n" ] } ], "source": [ "!head 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-Hypermethylated-DML.txt" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 37 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-Hypermethylated-DML.txt\r\n" ] } ], "source": [ "!wc -l 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-Hypermethylated-DML.txt" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "#### Hypomethylated DML" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wo \\\n", "-a {promoterTrack} \\\n", "-b {hypoDML} \\\n", "> 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-Hypomethylated-DML.txt" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035781.1\tGnomon\tmRNA\t7626060\t7627059\t.\t+\t.\tID=rna7537;Parent=gene4444;Dbxref=GeneID:111120066,Genbank:XM_022460716.1;Name=XM_022460716.1;gbkey=mRNA;gene=LOC111120066;model_evidence=Supporting evidence includes similarity to: 4 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 15 samples with support for all annotated introns;product=short-chain dehydrogenase/reductase family 42E member 1-like%2C transcript variant X2;transcript_id=XM_022460716.1\tNC_035781.1\t7626510\t7626512\t-56\t1\r\n", "NC_035781.1\tGnomon\tmRNA\t7626070\t7627069\t.\t+\t.\tID=rna7538;Parent=gene4444;Dbxref=GeneID:111120066,Genbank:XM_022460717.1;Name=XM_022460717.1;gbkey=mRNA;gene=LOC111120066;model_evidence=Supporting evidence includes similarity to: 4 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 16 samples with support for all annotated introns;product=short-chain dehydrogenase/reductase family 42E member 1-like%2C transcript variant X3;transcript_id=XM_022460717.1\tNC_035781.1\t7626510\t7626512\t-56\t1\r\n", "NC_035783.1\tGnomon\tmRNA\t50075390\t50076389\t.\t+\t.\tID=rna27643;Parent=gene16023;Dbxref=GeneID:111130373,Genbank:XM_022477391.1;Name=XM_022477391.1;Note=The sequence of the model RefSeq transcript was modified relative to this genomic sequence to represent the inferred CDS: inserted 5 bases in 5 codons;exception=unclassified transcription discrepancy;gbkey=mRNA;gene=LOC111130373;model_evidence=Supporting evidence includes similarity to: 4 Proteins%2C and 92%25 coverage of the annotated genomic feature by RNAseq alignments;product=rap guanine nucleotide exchange factor 4-like;transcript_id=XM_022477391.1\tNC_035783.1\t50075652\t50075654\t-52\t1\r\n", "NC_035781.1\tGnomon\tmRNA\t20125254\t20126253\t.\t-\t.\tID=rna9080;Parent=gene5340;Dbxref=GeneID:111122222,Genbank:XM_022463862.1;Name=XM_022463862.1;gbkey=mRNA;gene=LOC111122222;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 18 samples with support for all annotated introns;product=phosphatidylinositol N-acetylglucosaminyltransferase subunit Q-like;transcript_id=XM_022463862.1\tNC_035781.1\t20126029\t20126031\t-52\t1\r\n", "NC_035784.1\tGnomon\tmRNA\t90909407\t90910406\t.\t-\t.\tID=rna39145;Parent=gene22554;Dbxref=GeneID:111137470,Genbank:XM_022488933.1;Name=XM_022488933.1;gbkey=mRNA;gene=LOC111137470;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=uncharacterized LOC111137470;transcript_id=XM_022488933.1\tNC_035784.1\t90909622\t90909624\t-65\t1\r\n" ] } ], "source": [ "!head 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-Hypomethylated-DML.txt" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 5 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-Hypomethylated-DML.txt\r\n" ] } ], "source": [ "!wc -l 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-Hypomethylated-DML.txt" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "#### CG motifs" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "collapsed": false }, "outputs": [], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-wo \\\n", "-a {promoterTrack} \\\n", "-b {CGMotifList} \\\n", "> 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-CGmotif.txt" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\tNC_035780.1\t27969\t27971\tCG_motif\t1\r\n", "NC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\tNC_035780.1\t27979\t27981\tCG_motif\t1\r\n", "NC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\tNC_035780.1\t28081\t28083\tCG_motif\t1\r\n", "NC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\tNC_035780.1\t28130\t28132\tCG_motif\t1\r\n", "NC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\tNC_035780.1\t28147\t28149\tCG_motif\t1\r\n", "NC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\tNC_035780.1\t28169\t28171\tCG_motif\t1\r\n", "NC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\tNC_035780.1\t28209\t28211\tCG_motif\t1\r\n", "NC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\tNC_035780.1\t28211\t28213\tCG_motif\t1\r\n", "NC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\tNC_035780.1\t28228\t28230\tCG_motif\t1\r\n", "NC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\tNC_035780.1\t28308\t28310\tCG_motif\t1\r\n" ] } ], "source": [ "!head 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-CGmotif.txt" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1384653 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-CGmotif.txt\r\n" ] } ], "source": [ "!wc -l 2019-05-29-Flanking-Analysis/2019-09-26-Promoter-CGmotif.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4b. No overlaps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I also want to count the number of DML or DMR that do not overlap with any features (i.e. DML and DMR in unannotated intergenic regions). To do this, I'll use the `-v` argument in `bedtools`, which reports \"those entries in A that have no overlap in B.\" I can specify multiple files with `-b`. I'll use exons, introns, transposable elements identified using all species, and putative promoter regions (upstream flanks)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DML" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 21\n", "DML do not overlap with exons, introns, transposable elements (all), or putative promoters\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-v \\\n", "-a {DMLlist} \\\n", "-b {exonList} {intronList} {transposableElementsAll} {promoterTrack} \\\n", "| wc -l\n", "!echo \"DML do not overlap with exons, introns, transposable elements (all), or putative promoters\"" ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-v \\\n", "-a {DMLlist} \\\n", "-b {exonList} {intronList} {transposableElementsAll} {promoterTrack} \\\n", "> 2019-05-29-No-Overlap-DML.txt" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035781.1\t20620123\t20620125\t57\r\n", "NC_035781.1\t30062222\t30062224\t60\r\n", "NC_035781.1\t31150010\t31150012\t53\r\n", "NC_035781.1\t39583208\t39583210\t-50\r\n", "NC_035781.1\t50711254\t50711256\t-71\r\n", "NC_035782.1\t6897406\t6897408\t-53\r\n", "NC_035782.1\t58675230\t58675232\t52\r\n", "NC_035782.1\t65377028\t65377030\t51\r\n", "NC_035782.1\t72205396\t72205398\t-55\r\n", "NC_035783.1\t11164219\t11164221\t-62\r\n" ] } ], "source": [ "!head 2019-05-29-No-Overlap-DML.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypermethylated DML" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 11\n", "hypermethylated DML do not overlap with exons, introns, transposable elements (all), or putative promoters\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-v \\\n", "-a {hyperDML} \\\n", "-b {exonList} {intronList} {transposableElementsAll} {promoterTrack} \\\n", "| wc -l\n", "!echo \"hypermethylated DML do not overlap with exons, introns, transposable elements (all), or putative promoters\"" ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-v \\\n", "-a {hyperDML} \\\n", "-b {exonList} {intronList} {transposableElementsAll} {promoterTrack} \\\n", "> 2019-05-29-No-Overlap-Hypermethylated-DML.txt" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035781.1\t20620123\t20620125\t57\r\n", "NC_035781.1\t30062222\t30062224\t60\r\n", "NC_035781.1\t31150010\t31150012\t53\r\n", "NC_035782.1\t58675230\t58675232\t52\r\n", "NC_035782.1\t65377028\t65377030\t51\r\n", "NC_035784.1\t45667412\t45667414\t56\r\n", "NC_035784.1\t53515949\t53515951\t50\r\n", "NC_035785.1\t31238802\t31238804\t59\r\n", "NC_035787.1\t42603398\t42603400\t57\r\n", "NC_035787.1\t44016221\t44016223\t70\r\n" ] } ], "source": [ "!head 2019-05-29-No-Overlap-Hypermethylated-DML.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hypomethylated DML" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 10\n", "hypomethylated DML do not overlap with exons, introns, transposable elements (all), or putative promoters\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-v \\\n", "-a {hypoDML} \\\n", "-b {exonList} {intronList} {transposableElementsAll} {promoterTrack} \\\n", "| wc -l\n", "!echo \"hypomethylated DML do not overlap with exons, introns, transposable elements (all), or putative promoters\"" ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-v \\\n", "-a {hypoDML} \\\n", "-b {exonList} {intronList} {transposableElementsAll} {promoterTrack} \\\n", "> 2019-05-29-No-Overlap-Hypomethylated-DML.txt" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035781.1\t39583208\t39583210\t-50\r\n", "NC_035781.1\t50711254\t50711256\t-71\r\n", "NC_035782.1\t6897406\t6897408\t-53\r\n", "NC_035782.1\t72205396\t72205398\t-55\r\n", "NC_035783.1\t11164219\t11164221\t-62\r\n", "NC_035784.1\t2011997\t2011999\t-60\r\n", "NC_035784.1\t81666532\t81666534\t-65\r\n", "NC_035784.1\t92841543\t92841545\t-56\r\n", "NC_035787.1\t42755937\t42755939\t-54\r\n", "NC_035788.1\t78353418\t78353420\t-75\r\n" ] } ], "source": [ "!head 2019-05-29-No-Overlap-Hypomethylated-DML.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### CG motifs" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 4499027\n", "CG motifs do not overlap with exons, introns, transposable elements (all), or putative promoters\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-v \\\n", "-a {CGMotifList} \\\n", "-b {exonList} {intronList} {transposableElementsAll} {promoterTrack} \\\n", "| wc -l\n", "!echo \"CG motifs do not overlap with exons, introns, transposable elements (all), or putative promoters\"" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-v \\\n", "-a {CGMotifList} \\\n", "-b {exonList} {intronList} {transposableElementsAll} {promoterTrack} \\\n", "> 2019-05-29-No-Overlap-CGmotifs.txt" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t28\t30\tCG_motif\r\n", "NC_035780.1\t54\t56\tCG_motif\r\n", "NC_035780.1\t75\t77\tCG_motif\r\n", "NC_035780.1\t93\t95\tCG_motif\r\n", "NC_035780.1\t103\t105\tCG_motif\r\n", "NC_035780.1\t116\t118\tCG_motif\r\n", "NC_035780.1\t134\t136\tCG_motif\r\n", "NC_035780.1\t159\t161\tCG_motif\r\n", "NC_035780.1\t209\t211\tCG_motif\r\n", "NC_035780.1\t224\t226\tCG_motif\r\n" ] } ], "source": [ "!head 2019-05-29-No-Overlap-CGmotifs.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4c. `closest`\n", "\n", "[`bedtools closest`](https://bedtools.readthedocs.io/en/latest/content/tools/closest.html) will find the nearest gene to a DML or DMR. If the closest feature is not overlapping, I'll get the distance to the next feature. If the closest feature is overlapping, the distance would be zero. I will use the following code:\n", "\n", "1. Path to `closestBed`\n", "3. -a: Specify either DML, DMR, or CG motif file.\n", "4. -b: Path to gene list\n", "6. -t all: In case of a tie, report all matches\n", "7. -D ref: Report distance to A in an extra column. Use negative distances to report upstream features with respect to the reference genome. B features with a lower (start, stop) are upstream.\n", "8. \">\" filename: Redirect output to a .txt file" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}closestBed \\\n", "-a {DMLlist} \\\n", "-b {geneList} \\\n", "-t all \\\n", "-D ref \\\n", "> 2019-05-29-Flanking-Analysis/2019-05-29-Genes-Closest-NoOverlap-DMLs.txt" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t401630\t401632\t53\tNC_035780.1\t394983\t409280\t0\r\n", "NC_035780.1\t571138\t571140\t58\tNC_035780.1\t544088\t573497\t0\r\n", "NC_035780.1\t1882691\t1882693\t64\tNC_035780.1\t1882143\t1890106\t0\r\n", "NC_035780.1\t1885022\t1885024\t61\tNC_035780.1\t1882143\t1890106\t0\r\n", "NC_035780.1\t1933499\t1933501\t51\tNC_035780.1\t1928718\t1940217\t0\r\n", "NC_035780.1\t2538924\t2538926\t-50\tNC_035780.1\t2524425\t2553408\t0\r\n", "NC_035780.1\t2541726\t2541728\t-54\tNC_035780.1\t2524425\t2553408\t0\r\n", "NC_035780.1\t2584492\t2584494\t56\tNC_035780.1\t2554181\t2599559\t0\r\n", "NC_035780.1\t2586508\t2586510\t-53\tNC_035780.1\t2554181\t2599559\t0\r\n", "NC_035780.1\t2589720\t2589722\t57\tNC_035780.1\t2554181\t2599559\t0\r\n" ] } ], "source": [ "!head 2019-05-29-Flanking-Analysis/2019-05-29-Genes-Closest-NoOverlap-DMLs.txt" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [], "source": [ "! {bedtoolsDirectory}closestBed \\\n", "-a {DMRlist} \\\n", "-b {geneList} \\\n", "-t all \\\n", "-D ref \\\n", "> 2019-05-29-Flanking-Analysis/2019-06-05-Genes-Closest-NoOverlap-DMRs.txt" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t571100\t571200\tDMR\t58\tNC_035780.1\t544088\t573497\t0\r\n", "NC_035780.1\t1885000\t1885100\tDMR\t50\tNC_035780.1\t1882143\t1890106\t0\r\n", "NC_035780.1\t1933500\t1933600\tDMR\t53\tNC_035780.1\t1928718\t1940217\t0\r\n", "NC_035780.1\t2538900\t2539000\tDMR\t-50\tNC_035780.1\t2524425\t2553408\t0\r\n", "NC_035780.1\t22276700\t22276800\tDMR\t56\tNC_035780.1\t22269635\t22278631\t0\r\n", "NC_035780.1\t28563400\t28563500\tDMR\t61\tNC_035780.1\t28552157\t28576101\t0\r\n", "NC_035780.1\t31302900\t31303000\tDMR\t-60\tNC_035780.1\t31295876\t31307973\t0\r\n", "NC_035780.1\t35969100\t35969200\tDMR\t-53\tNC_035780.1\t35960923\t35999467\t0\r\n", "NC_035780.1\t38236400\t38236500\tDMR\t50\tNC_035780.1\t38209799\t38243110\t0\r\n", "NC_035781.1\t5386400\t5386500\tDMR\t51\tNC_035781.1\t5383711\t5397505\t0\r\n" ] } ], "source": [ "!head 2019-05-29-Flanking-Analysis/2019-06-05-Genes-Closest-NoOverlap-DMRs.txt" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}closestBed \\\n", "-a {CGMotifList} \\\n", "-b {geneList} \\\n", "-t all \\\n", "-D ref \\\n", "> 2019-05-29-Flanking-Analysis/2019-05-29-Gene-Closest-NoOverlap-CGmotifs.txt" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t28\t30\tCG_motif\tNC_035780.1\t13578\t14594\t13549\r\n", "NC_035780.1\t54\t56\tCG_motif\tNC_035780.1\t13578\t14594\t13523\r\n", "NC_035780.1\t75\t77\tCG_motif\tNC_035780.1\t13578\t14594\t13502\r\n", "NC_035780.1\t93\t95\tCG_motif\tNC_035780.1\t13578\t14594\t13484\r\n", "NC_035780.1\t103\t105\tCG_motif\tNC_035780.1\t13578\t14594\t13474\r\n", "NC_035780.1\t116\t118\tCG_motif\tNC_035780.1\t13578\t14594\t13461\r\n", "NC_035780.1\t134\t136\tCG_motif\tNC_035780.1\t13578\t14594\t13443\r\n", "NC_035780.1\t159\t161\tCG_motif\tNC_035780.1\t13578\t14594\t13418\r\n", "NC_035780.1\t209\t211\tCG_motif\tNC_035780.1\t13578\t14594\t13368\r\n", "NC_035780.1\t224\t226\tCG_motif\tNC_035780.1\t13578\t14594\t13353\r\n" ] } ], "source": [ "!head 2019-05-29-Flanking-Analysis/2019-05-29-Gene-Closest-NoOverlap-CGmotifs.txt" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## 5. Characterize DML Background" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### mRNA" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 333917\n", "DML background overlaps with mRNA\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMLBackground} \\\n", "-b {mRNAList} \\\n", "| wc -l\n", "!echo \"DML background overlaps with mRNA\"" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMLBackground} \\\n", "-b {mRNAList} \\\n", "> 2019-06-20-DMLBackground-mRNA.txt" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t100558\t100559\t+\tNC_035780.1\t99840\t106460\r\n", "NC_035780.1\t100575\t100576\t+\tNC_035780.1\t99840\t106460\r\n", "NC_035780.1\t100581\t100582\t+\tNC_035780.1\t99840\t106460\r\n", "NC_035780.1\t100634\t100635\t+\tNC_035780.1\t99840\t106460\r\n", "NC_035780.1\t100643\t100644\t+\tNC_035780.1\t99840\t106460\r\n", "NC_035780.1\t100651\t100652\t+\tNC_035780.1\t99840\t106460\r\n", "NC_035780.1\t100664\t100665\t+\tNC_035780.1\t99840\t106460\r\n", "NC_035780.1\t103268\t103269\t+\tNC_035780.1\t99840\t106460\r\n", "NC_035780.1\t103272\t103273\t+\tNC_035780.1\t99840\t106460\r\n", "NC_035780.1\t103283\t103284\t+\tNC_035780.1\t99840\t106460\r\n" ] } ], "source": [ "!head 2019-06-20-DMLBackground-mRNA.txt" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### Promoters" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 7835\n", "DML background overlaps with putiative promoters\n" ] } ], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a {DMLBackground} \\\n", "-b {promoterTrack} \\\n", "| wc -l\n", "!echo \"DML background overlaps with putiative promoters\"" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true }, "outputs": [], "source": [ "! {bedtoolsDirectory}intersectBed \\\n", "-wb \\\n", "-a {DMLBackground} \\\n", "-b {promoterTrack} \\\n", "> 2019-06-20-DMLBackground-Promoters.txt" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t273523\t273524\t+\tNC_035780.1\tGnomon\tmRNA\t272840\t273839\t.\t-\t.\tID=rna20;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468012.1;Name=XM_022468012.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 24 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X3;transcript_id=XM_022468012.1\r\n", "NC_035780.1\t273523\t273524\t+\tNC_035780.1\tGnomon\tmRNA\t272840\t273839\t.\t-\t.\tID=rna21;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468021.1;Name=XM_022468021.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 24 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X4;transcript_id=XM_022468021.1\r\n", "NC_035780.1\t273523\t273524\t+\tNC_035780.1\tGnomon\tmRNA\t272827\t273826\t.\t-\t.\tID=rna22;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468004.1;Name=XM_022468004.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 18 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X2;transcript_id=XM_022468004.1\r\n", "NC_035780.1\t273523\t273524\t+\tNC_035780.1\tGnomon\tmRNA\t272840\t273839\t.\t-\t.\tID=rna23;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022467995.1;Name=XM_022467995.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 22 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X1;transcript_id=XM_022467995.1\r\n", "NC_035780.1\t273523\t273524\t+\tNC_035780.1\tGnomon\tmRNA\t272840\t273839\t.\t-\t.\tID=rna24;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468030.1;Name=XM_022468030.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 24 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X5;transcript_id=XM_022468030.1\r\n", "NC_035780.1\t273603\t273604\t+\tNC_035780.1\tGnomon\tmRNA\t272840\t273839\t.\t-\t.\tID=rna20;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468012.1;Name=XM_022468012.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 24 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X3;transcript_id=XM_022468012.1\r\n", "NC_035780.1\t273603\t273604\t+\tNC_035780.1\tGnomon\tmRNA\t272840\t273839\t.\t-\t.\tID=rna21;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468021.1;Name=XM_022468021.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 24 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X4;transcript_id=XM_022468021.1\r\n", "NC_035780.1\t273603\t273604\t+\tNC_035780.1\tGnomon\tmRNA\t272827\t273826\t.\t-\t.\tID=rna22;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468004.1;Name=XM_022468004.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 18 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X2;transcript_id=XM_022468004.1\r\n", "NC_035780.1\t273603\t273604\t+\tNC_035780.1\tGnomon\tmRNA\t272840\t273839\t.\t-\t.\tID=rna23;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022467995.1;Name=XM_022467995.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 22 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X1;transcript_id=XM_022467995.1\r\n", "NC_035780.1\t273603\t273604\t+\tNC_035780.1\tGnomon\tmRNA\t272840\t273839\t.\t-\t.\tID=rna24;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468030.1;Name=XM_022468030.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 24 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X5;transcript_id=XM_022468030.1\r\n" ] } ], "source": [ "!head 2019-06-20-DMLBackground-Promoters.txt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }