{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Aspen Coyle\n", "\n", "afcoyle@uw.edu\n", "\n", "2021-07-01\n", "\n", "Roberts Lab, UW-SAFS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In script 7_2_manual_clustering_hematv1.6.Rmd, we took libraries aligned to a transcriptome filtered to only include presumed _Hematodinium_ genes, grouped them according to host crab (e.g. took all libraries for Crab A, B, C...) and clustered gene expression into modules based on expression patterns\n", "\n", "We then described the expression patterns of each module as following one of five patterns. Crabs with three time points (ambient- and lowered-temperature treatment crab) had the following notation used:\n", "\n", "- High to low (HTL): Expression decreases over time (regardless of whether the decrease took place on Day 2 or Day 17)\n", "\n", "- Low to high (LTH): Expression increases over time (regardless of whether the increase took place on Day 2 or Day 17)\n", "\n", "- Low High Low (LHL): Expression increases on Day 2, and then drops on Day 17\n", "\n", "- High Low High (HLH): Expression drops on Day 2 and then increases on Day 17\n", "\n", "- Mixed (MIX): Expression within the module follows no clear pattern\n", "\n", "Crabs in the Elevated-temperature treatment group had only two time points (crabs G, H, and I). For these, a different notation was used. \n", "\n", "- LL = expression stays low\n", "\n", "- HH = expression stays high\n", "\n", "- LH = expression goes from low to high\n", "\n", "- HL = expression goes from high to low\n", "\n", "- MIX = mixed - no clear pattern of expression within the module" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Importantly, **multiple modules within a single crab could be given the same assignment**. This issue is what this script is meant to solve by merging gene lists." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's see an example of one crab" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bar_5CtsPerCrab\t\t cluster_HTL_heatmap.png cluster_LTH2.txt\n", "cluster_HTL.txt\t\t cluster_LHL.txt\t cluster_LTH2_heatmap.png\n", "cluster_HTL2.txt\t cluster_LHL_heatmap.png cluster_LTH_heatmap.png\n", "cluster_HTL2_heatmap.png cluster_LTH.txt\t heatmap.png\n" ] } ], "source": [ "!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And let's also see what each cluster looks like" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\"178\"\t\"359\"\t\"463\"\n", "\"TRINITY_DN4727_c0_g1_i1\"\t1.95949\t0.549945\t0\n", "\"TRINITY_DN4752_c0_g1_i1\"\t1.30359\t0.133042\t0\n", "\"TRINITY_DN77_c0_g1_i1\"\t0.688943\t0.0441782\t0\n", "\"TRINITY_DN88_c0_g1_i4\"\t2.58364\t1.9115\t0\n", "\"TRINITY_DN88_c0_g2_i3\"\t4.87918\t1.49507\t0\n", "\"TRINITY_DN10_c2_g1_i1\"\t6.40426\t3.84929\t0\n", "\"TRINITY_DN61_c0_g1_i3\"\t0.162944\t0\t0\n", "\"TRINITY_DN21_c0_g1_i14\"\t1.38704\t0\t0\n", "\"TRINITY_DN21_c0_g1_i2\"\t3.849\t1.17845\t0\n" ] } ], "source": [ "!head ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/cluster_HTL.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks like we need to remove the first line of each file - otherwise, when we merge modules, the header line will be included. And since columns correspond to days 0, 2, and 17 samples, it's not too meaningful\n", "\n", "Now, let's see how many crab folders we have" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Crab_A\tCrab_C\tCrab_E\tCrab_G\tCrab_I\n", "Crab_B\tCrab_D\tCrab_F\tCrab_H\tbar_5CtsPerCrab_merged_modules_raw_counts.txt\n" ] } ], "source": [ "!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good! We can move on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Crab A\n", "\n", "We'll now start on merging all modules for Crab A\n", "\n", "Let's take another look at the current modules for Crab A" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bar_5CtsPerCrab\t\t cluster_HTL_heatmap.png cluster_LTH2.txt\n", "cluster_HTL.txt\t\t cluster_LHL.txt\t cluster_LTH2_heatmap.png\n", "cluster_HTL2.txt\t cluster_LHL_heatmap.png cluster_LTH_heatmap.png\n", "cluster_HTL2_heatmap.png cluster_LTH.txt\t heatmap.png\n" ] } ], "source": [ "!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Make new directory for merged modules\n", "!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules\n", "\n", "# Merge all HTL modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/HTL_merged.txt\n", "\n", "# Merge all LTH modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/LTH_merged.txt\n", "\n", "# Merge all LHL modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/LHL_merged.txt\n", "\n", "# Won't merge MIX or HLH modules, as none are present in this crab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 68 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/cluster_HTL.txt\n", " 4 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/cluster_HTL2.txt\n", " 73 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/cluster_LHL.txt\n", " 20 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/cluster_LTH.txt\n", " 70 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/cluster_LTH2.txt\n", " 235 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/cluster_*txt" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 70 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/HTL_merged.txt\n", " 72 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/LHL_merged.txt\n", " 88 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/LTH_merged.txt\n", " 230 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/*merged.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good! We can move on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Crab B" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bar_5CtsPerCrab\t\t cluster_HTL2.txt\t cluster_LHL_heatmap.png\n", "cluster_HLH.txt\t\t cluster_HTL2_heatmap.png cluster_LTH.txt\n", "cluster_HLH_heatmap.png cluster_HTL_heatmap.png cluster_LTH_heatmap.png\n", "cluster_HTL.txt\t\t cluster_LHL.txt\t heatmap.png\n" ] } ], "source": [ "!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Make new directory for merged modules\n", "!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules\n", "\n", "# Merge all HTL modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/HTL_merged.txt\n", "\n", "# Merge all LTH modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/LTH_merged.txt\n", "\n", "# Merge all HLH modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/HLH_merged.txt\n", "\n", "# Merge all LHL modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/LHL_merged.txt\n", "\n", "# Won't merge MIX or HLH modules, as none are present in this crab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 11 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/cluster_HLH.txt\n", " 109 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/cluster_HTL.txt\n", " 9 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/cluster_HTL2.txt\n", " 99 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/cluster_LHL.txt\n", " 33 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/cluster_LTH.txt\n", " 261 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/cluster_*txt" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 10 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/HLH_merged.txt\n", " 116 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/HTL_merged.txt\n", " 98 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/LHL_merged.txt\n", " 32 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/LTH_merged.txt\n", " 256 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/*merged.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good! We can move on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Crab C" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bar_5CtsPerCrab\t\t cluster_HTL2_heatmap.png cluster_LTH2.txt\n", "cluster_HLH.txt\t\t cluster_HTL_heatmap.png cluster_LTH2_heatmap.png\n", "cluster_HLH_heatmap.png cluster_LHL.txt\t cluster_LTH_heatmap.png\n", "cluster_HTL.txt\t\t cluster_LHL_heatmap.png heatmap.png\n", "cluster_HTL2.txt\t cluster_LTH.txt\n" ] } ], "source": [ "!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Make new directory for merged modules\n", "!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules\n", "\n", "# Merge all HTL modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/HTL_merged.txt\n", "\n", "# Merge all LTH modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/LTH_merged.txt\n", "\n", "# Merge all HLH modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/HLH_merged.txt\n", "\n", "# Merge all LHL modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/LHL_merged.txt\n", "\n", "# Won't merge MIX modules, as none are present in this crab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 52 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/cluster_HLH.txt\n", " 22 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/cluster_HTL.txt\n", " 12 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/cluster_HTL2.txt\n", " 53 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/cluster_LHL.txt\n", " 82 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/cluster_LTH.txt\n", " 35 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/cluster_LTH2.txt\n", " 256 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/cluster_*txt" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 51 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/HLH_merged.txt\n", " 32 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/HTL_merged.txt\n", " 52 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/LHL_merged.txt\n", " 115 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/LTH_merged.txt\n", " 250 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/*merged.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good! We can move on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Crab D\n", "\n", "We'll now start on merging all modules for Crab D\n", "\n", "Let's take another look at the current modules for Crab F" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bar_5CtsPerCrab\t\t cluster_HTL_heatmap.png cluster_LHL_heatmap.png\n", "cluster_HLH.txt\t\t cluster_LHL.txt\t cluster_LTH.txt\n", "cluster_HLH_heatmap.png cluster_LHL2.txt\t cluster_LTH_heatmap.png\n", "cluster_HTL.txt\t\t cluster_LHL2_heatmap.png heatmap.png\n" ] } ], "source": [ "!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# Make new directory for merged modules\n", "!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules\n", "\n", "# Merge all HLH modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/HLH_merged.txt\n", "\n", "# Merge all HTL modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/HTL_merged.txt\n", "\n", "# Merge all LTH modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/LTH_merged.txt\n", "\n", "# Merge all LHL modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/LHL_merged.txt\n", "\n", "# Won't merge MIX modules, as none are present in this crab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 26 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/cluster_HLH.txt\n", " 25 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/cluster_HTL.txt\n", " 28 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/cluster_LHL.txt\n", " 16 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/cluster_LHL2.txt\n", " 56 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/cluster_LTH.txt\n", " 151 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/cluster_*txt" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 25 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/HLH_merged.txt\n", " 24 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/HTL_merged.txt\n", " 42 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/LHL_merged.txt\n", " 55 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/LTH_merged.txt\n", " 146 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/*merged.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good! We can move on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Crab E" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bar_5CtsPerCrab\t\t cluster_LHL.txt\t cluster_LTH2.txt\n", "cluster_HTL.txt\t\t cluster_LHL2.txt\t cluster_LTH2_heatmap.png\n", "cluster_HTL2.txt\t cluster_LHL2_heatmap.png cluster_LTH_heatmap.png\n", "cluster_HTL2_heatmap.png cluster_LHL_heatmap.png heatmap.png\n", "cluster_HTL_heatmap.png cluster_LTH.txt\n" ] } ], "source": [ "!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# Make new directory for merged modules\n", "!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules\n", "\n", "# Merge all HTL modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/HTL_merged.txt\n", "\n", "# Merge all LTH modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/LTH_merged.txt\n", "\n", "# Merge all LHL modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/LHL_merged.txt\n", "\n", "# Won't merge MIX or HLH modules, as none are present in this crab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 46 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/cluster_HTL.txt\n", " 15 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/cluster_HTL2.txt\n", " 11 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/cluster_LHL.txt\n", " 38 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/cluster_LHL2.txt\n", " 64 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/cluster_LTH.txt\n", " 21 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/cluster_LTH2.txt\n", " 195 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/cluster_*txt" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 59 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/HTL_merged.txt\n", " 47 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/LHL_merged.txt\n", " 83 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/LTH_merged.txt\n", " 189 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/*merged.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good! We can move on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Crab F\n", "\n", "We'll now start on merging all modules for Crab F\n", "\n", "Let's take another look at the current modules for Crab F" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bar_5CtsPerCrab\t\t cluster_HTL2_heatmap.png cluster_LTH2.txt\n", "cluster_HLH.txt\t\t cluster_HTL_heatmap.png cluster_LTH2_heatmap.png\n", "cluster_HLH_heatmap.png cluster_LHL.txt\t cluster_LTH_heatmap.png\n", "cluster_HTL.txt\t\t cluster_LHL_heatmap.png heatmap.png\n", "cluster_HTL2.txt\t cluster_LTH.txt\n" ] } ], "source": [ "!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "# Make new directory for merged modules\n", "!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules\n", "\n", "# Merge all HLH modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/HLH_merged.txt\n", "\n", "# Merge all HTL modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/HTL_merged.txt\n", "\n", "# Merge all LTH modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/LTH_merged.txt\n", "\n", "# Merge all LHL modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/LHL_merged.txt\n", "\n", "# Won't merge HLH or MIX modules, as none are present in this crab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 6 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/cluster_HLH.txt\n", " 40 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/cluster_HTL.txt\n", " 16 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/cluster_HTL2.txt\n", " 12 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/cluster_LHL.txt\n", " 62 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/cluster_LTH.txt\n", " 6 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/cluster_LTH2.txt\n", " 142 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/cluster_*txt" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 5 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/HLH_merged.txt\n", " 54 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/HTL_merged.txt\n", " 11 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/LHL_merged.txt\n", " 66 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/LTH_merged.txt\n", " 136 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/*merged.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good! We can move on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Crab G" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bar_5CtsPerCrab\t\tcluster_MIX2.txt\t cluster_MIX_heatmap.png\n", "cluster_HL.txt\t\tcluster_MIX2_heatmap.png heatmap.png\n", "cluster_HL_heatmap.png\tcluster_MIX3.txt\n", "cluster_MIX.txt\t\tcluster_MIX3_heatmap.png\n" ] } ], "source": [ "!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "# Make new directory for merged modules\n", "!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/merged_modules\n", "\n", "# Merge all HL modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G -maxdepth 1 -name cluster_HL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/merged_modules/HL_merged.txt\n", "\n", "# Merge all MIX modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/merged_modules/MIX_merged.txt\n", "\n", "# Won't merge HH, LH,or LL modules, as none are present in this crab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 3 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/cluster_HL.txt\n", " 145 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/cluster_MIX.txt\n", " 60 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/cluster_MIX2.txt\n", " 8 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/cluster_MIX3.txt\n", " 216 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/cluster_*txt" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 2 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/merged_modules/HL_merged.txt\n", " 210 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/merged_modules/MIX_merged.txt\n", " 212 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/merged_modules/*merged.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good! We can move on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Crab H" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bar_5CtsPerCrab\t\t cluster_MIX3.txt\t cluster_MIX_heatmap.png\n", "cluster_MIX.txt\t\t cluster_MIX3_heatmap.png heatmap.png\n", "cluster_MIX2.txt\t cluster_MIX4.txt\n", "cluster_MIX2_heatmap.png cluster_MIX4_heatmap.png\n" ] } ], "source": [ "!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "# Make new directory for merged modules\n", "!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/merged_modules\n", "\n", "# Merge all MIX modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/merged_modules/MIX_merged.txt\n", "\n", "# Won't merge any other modules, as only MIX are present in this crab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 88 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/cluster_MIX.txt\n", " 35 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/cluster_MIX2.txt\n", " 32 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/cluster_MIX3.txt\n", " 10 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/cluster_MIX4.txt\n", " 165 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/cluster_*txt" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "161 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/merged_modules/MIX_merged.txt\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/merged_modules/*merged.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good! We can move on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Crab I" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bar_5CtsPerCrab\t\tcluster_MIX2.txt\t cluster_MIX_heatmap.png\n", "cluster_LH.txt\t\tcluster_MIX2_heatmap.png heatmap.png\n", "cluster_LH_heatmap.png\tcluster_MIX4.txt\n", "cluster_MIX.txt\t\tcluster_MIX4_heatmap.png\n" ] } ], "source": [ "!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "# Make new directory for merged modules\n", "!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/merged_modules\n", "\n", "# Merge all LH modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I -maxdepth 1 -name cluster_LH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/merged_modules/LH_merged.txt\n", "\n", "# Merge all MIX modules\n", "!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/merged_modules/MIX_merged.txt\n", "\n", "# Won't merge HH or HL modules, as none are present in this crab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 20 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/cluster_LH.txt\n", " 71 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/cluster_MIX.txt\n", " 39 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/cluster_MIX2.txt\n", " 19 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/cluster_MIX4.txt\n", " 149 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/cluster_*txt" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 19 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/merged_modules/LH_merged.txt\n", " 126 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/merged_modules/MIX_merged.txt\n", " 145 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/merged_modules/*merged.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good! We can move on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Done merging" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's get a count of the number of lines in each module in each crab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Line Counts of Modules" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 70 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/HTL_merged.txt\n", " 72 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/LHL_merged.txt\n", " 88 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/LTH_merged.txt\n", " 10 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/HLH_merged.txt\n", " 116 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/HTL_merged.txt\n", " 98 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/LHL_merged.txt\n", " 32 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/LTH_merged.txt\n", " 51 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/HLH_merged.txt\n", " 32 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/HTL_merged.txt\n", " 52 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/LHL_merged.txt\n", " 115 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/LTH_merged.txt\n", " 25 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/HLH_merged.txt\n", " 24 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/HTL_merged.txt\n", " 42 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/LHL_merged.txt\n", " 55 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/LTH_merged.txt\n", " 59 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/HTL_merged.txt\n", " 47 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/LHL_merged.txt\n", " 83 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/LTH_merged.txt\n", " 5 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/HLH_merged.txt\n", " 54 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/HTL_merged.txt\n", " 11 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/LHL_merged.txt\n", " 66 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/LTH_merged.txt\n", " 2 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/merged_modules/HL_merged.txt\n", " 210 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/merged_modules/MIX_merged.txt\n", " 161 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/merged_modules/MIX_merged.txt\n", " 19 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/merged_modules/LH_merged.txt\n", " 126 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/merged_modules/MIX_merged.txt\n", " 1725 total\n" ] } ], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_*/merged_modules/*merged.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll now write the above word counts to a file, which we'll then turn into a table using R" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_*/merged_modules/*merged.txt > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/merged_modules_raw_counts.txt" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }