{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Aspen Coyle, afcoyle@uw.edu\n", "\n", "2021/03/20\n", "\n", "Roberts lab at SAFS\n", "\n", "### Removing duplicate lines from genes\n", "\n", "Completed getting all our GO terms! Now, time to get input for GO-MWU.\n", "We need 2 tables:\n", "- A 2-column table of genes and GO terms\n", "- A 2-column table of genes and log2 fold change without repeated genes\n", "\n", "The table of genes and log2 fold change is created using R in the script 11_2_prod_cluster_GOterms.Rmd\n", "In getting GO terms, we made a 2-col tab-separated table of genes and GO terms.\n", "Now we need to eliminate all repeated genes. To do this, we'll use the nrify_GOtable.pl script. \n", "This can be found in the [GitHub repo for GO-MWU](https://github.com/z0on/GO_MWU)\n", "\n", "We'll put these files in a subdirectory within the scripts directory, as this is what GO-MWU calls for. All GO-MWU files are also within that directory" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'/mnt/c/Users/acoyl/Documents/GitHub/hemat_bairdi_transcriptome/scripts'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pwd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## All Samples" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Eliminate gene repeats, concatenate GO terms for cbai_v2.0, all samples, blue module\n", "!11_4_running_GOMWU/./nrify_GOtable.pl ../output/accession_n_GOids/WGCNA_modules/cbai_transcriptomev2.0/all_crabs_blue_GOIDs.txt > 11_4_running_GOMWU/cbai2.0_all_crabs_blue_module_GOIDs_norepeats.txt" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1803 11_4_running_GOMWU/cbai2.0_all_crabs_blue_module_GOIDs_norepeats.txt\n" ] } ], "source": [ "# See length of file without repeats\n", "!wc -l 11_4_running_GOMWU/cbai2.0_all_crabs_blue_module_GOIDs_norepeats.txt" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1803\n" ] } ], "source": [ "# Manually see how many repeats we have - looks indentical, script running properly\n", "!sort ../output/accession_n_GOids/WGCNA_modules/cbai_transcriptomev2.0/all_crabs_blue_GOIDs.txt | uniq | wc -l" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Continue with cbai_v2.0, all samples, cyan module\n", "!11_4_running_GOMWU/./nrify_GOtable.pl ../output/accession_n_GOids/WGCNA_modules/cbai_transcriptomev2.0/all_crabs_cyan_GOIDs.txt > 11_4_running_GOMWU/cbai2.0_all_crabs_cyan_module_GOIDs_norepeats.txt" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Continue with cbai_v2.0, all samples, salmon module\n", "!11_4_running_GOMWU/./nrify_GOtable.pl ../output/accession_n_GOids/WGCNA_modules/cbai_transcriptomev2.0/all_crabs_salmon_GOIDs.txt > 11_4_running_GOMWU/cbai2.0_all_crabs_salmon_module_GOIDs_norepeats.txt" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Continue with cbai_v2.0, all samples, yellow module\n", "!11_4_running_GOMWU/./nrify_GOtable.pl ../output/accession_n_GOids/WGCNA_modules/cbai_transcriptomev2.0/all_crabs_yellow_GOIDs.txt > 11_4_running_GOMWU/cbai2.0_all_crabs_yellow_module_GOIDs_norepeats.txt" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Continue with cbai_v4.0, all samples, brown module\n", "!11_4_running_GOMWU/./nrify_GOtable.pl ../output/accession_n_GOids/WGCNA_modules/cbai_transcriptomev4.0/all_crabs_brown_GOIDs.txt > 11_4_running_GOMWU/cbai4.0_all_crabs_brown_module_GOIDs_norepeats.txt" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Continue with cbai_v4.0, all samples, black module\n", "!11_4_running_GOMWU/./nrify_GOtable.pl ../output/accession_n_GOids/WGCNA_modules/cbai_transcriptomev4.0/all_crabs_black_GOIDs.txt > 11_4_running_GOMWU/cbai4.0_all_crabs_black_module_GOIDs_norepeats.txt" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Continue with hemat_v1.6, all samples, brown module\n", "!11_4_running_GOMWU/./nrify_GOtable.pl ../output/accession_n_GOids/WGCNA_modules/hemat_transcriptomev1.6/all_crabs_brown_GOIDs.txt > 11_4_running_GOMWU/hemat1.6_all_crabs_brown_module_GOIDs_norepeats.txt" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Continue with hemat_v1.6, all samples, pink module\n", "!11_4_running_GOMWU/./nrify_GOtable.pl ../output/accession_n_GOids/WGCNA_modules/hemat_transcriptomev1.6/all_crabs_pink_GOIDs.txt > 11_4_running_GOMWU/hemat1.6_all_crabs_pink_module_GOIDs_norepeats.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ambient and Lowered-temp Samples\n", "\n", "As described in earlier scripts (check 11_1), the WGCNA run on these samples was explicitly to look at change in _Hematodinium_ level over time, and thus excluded elevated-temp samples (for which only one timepoint with qPCR data is available per crab)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Eliminate gene repeats, concatenate GO terms for cbai_v2.0, amb+low samples, blue module\n", "!11_4_running_GOMWU/./nrify_GOtable.pl ../output/accession_n_GOids/WGCNA_modules/cbai_transcriptomev2.0/amb_low_blue_GOIDs.txt > 11_4_running_GOMWU/cbai2.0_amb_low_blue_module_GOIDs_norepeats.txt" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1659 11_4_running_GOMWU/cbai2.0_amb_low_blue_module_GOIDs_norepeats.txt\n" ] } ], "source": [ "# See length of file without repeats\n", "!wc -l 11_4_running_GOMWU/cbai2.0_amb_low_blue_module_GOIDs_norepeats.txt" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1659\n" ] } ], "source": [ "# Manually see how many repeats we have - looks indentical, script running properly\n", "!sort ../output/accession_n_GOids/WGCNA_modules/cbai_transcriptomev2.0/amb_low_blue_GOIDs.txt | uniq | wc -l" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Continue with cbai_v4.0, amb + low, red module\n", "!11_4_running_GOMWU/./nrify_GOtable.pl ../output/accession_n_GOids/WGCNA_modules/cbai_transcriptomev4.0/amb_low_red_GOIDs.txt > 11_4_running_GOMWU/cbai4.0_amb_low_red_module_GOIDs_norepeats.txt" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# Continue with hemat_v1.6.0, amb + low, green module\n", "!11_4_running_GOMWU/./nrify_GOtable.pl ../output/accession_n_GOids/WGCNA_modules/hemat_transcriptomev1.6/amb_low_green_GOIDs.txt > 11_4_running_GOMWU/hemat1.6_amb_low_green_module_GOIDs_norepeats.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Well done! Each comparison should now have both inputs needed for GOMWU" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }