Aspen Coyle

afcoyle@uw.edu

2021-07-01

Roberts Lab, UW-SAFS

In script 7_2_manual_clustering_hematv1.6.Rmd, we took libraries aligned to a transcriptome filtered to only include presumed _Hematodinium_ genes, grouped them according to host crab (e.g. took all libraries for Crab A, B, C...) and clustered gene expression into modules based on expression patterns

We then described the expression patterns of each module as following one of five patterns. Crabs with three time points (ambient- and lowered-temperature treatment crab) had the following notation used:

- High to low (HTL): Expression decreases over time (regardless of whether the decrease took place on Day 2 or Day 17)

- Low to high (LTH): Expression increases over time (regardless of whether the increase took place on Day 2 or Day 17)

- Low High Low (LHL): Expression increases on Day 2, and then drops on Day 17

- High Low High (HLH): Expression drops on Day 2 and then increases on Day 17

- Mixed (MIX): Expression within the module follows no clear pattern

Crabs in the Elevated-temperature treatment group had only two time points (crabs G, H, and I). For these, a different notation was used. 

- LL = expression stays low

- HH = expression stays high

- LH = expression goes from low to high

- HL = expression goes from high to low

- MIX = mixed - no clear pattern of expression within the module

Importantly, **multiple modules within a single crab could be given the same assignment**. This issue is what this script is meant to solve by merging gene lists.

First, let's see an example of one crab

In [1]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/

bar_5CtsPerCrab		 cluster_HTL_heatmap.png cluster_LTH2.txt
cluster_HTL.txt		 cluster_LHL.txt	 cluster_LTH2_heatmap.png
cluster_HTL2.txt	 cluster_LHL_heatmap.png cluster_LTH_heatmap.png
cluster_HTL2_heatmap.png cluster_LTH.txt	 heatmap.png


And let's also see what each cluster looks like

In [2]:
!head ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/cluster_HTL.txt

"178"	"359"	"463"
"TRINITY_DN4727_c0_g1_i1"	1.95949	0.549945	0
"TRINITY_DN4752_c0_g1_i1"	1.30359	0.133042	0
"TRINITY_DN77_c0_g1_i1"	0.688943	0.0441782	0
"TRINITY_DN88_c0_g1_i4"	2.58364	1.9115	0
"TRINITY_DN88_c0_g2_i3"	4.87918	1.49507	0
"TRINITY_DN10_c2_g1_i1"	6.40426	3.84929	0
"TRINITY_DN61_c0_g1_i3"	0.162944	0	0
"TRINITY_DN21_c0_g1_i14"	1.38704	0	0
"TRINITY_DN21_c0_g1_i2"	3.849	1.17845	0


Looks like we need to remove the first line of each file - otherwise, when we merge modules, the header line will be included. And since columns correspond to days 0, 2, and 17 samples, it's not too meaningful

Now, let's see how many crab folders we have

In [3]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/

Crab_A	Crab_C	Crab_E	Crab_G	Crab_I
Crab_B	Crab_D	Crab_F	Crab_H	bar_5CtsPerCrab_merged_modules_raw_counts.txt


Looks good! We can move on.

## Crab A

We'll now start on merging all modules for Crab A

Let's take another look at the current modules for Crab A

In [5]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/

bar_5CtsPerCrab		 cluster_HTL_heatmap.png cluster_LTH2.txt
cluster_HTL.txt		 cluster_LHL.txt	 cluster_LTH2_heatmap.png
cluster_HTL2.txt	 cluster_LHL_heatmap.png cluster_LTH_heatmap.png
cluster_HTL2_heatmap.png cluster_LTH.txt	 heatmap.png


In [6]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/LTH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/LHL_merged.txt

# Won't merge MIX or HLH modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [7]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/cluster_*txt

 68 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/cluster_HTL.txt
 4 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/cluster_HTL2.txt
 73 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/cluster_LHL.txt
 20 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/cluster_LTH.txt
 70 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/cluster_LTH2.txt
 235 total


In [8]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/*merged.txt

 70 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/HTL_merged.txt
 72 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/LHL_merged.txt
 88 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/LTH_merged.txt
 230 total


Looks good! We can move on.

## Crab B

In [9]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/

bar_5CtsPerCrab		 cluster_HTL2.txt	 cluster_LHL_heatmap.png
cluster_HLH.txt		 cluster_HTL2_heatmap.png cluster_LTH.txt
cluster_HLH_heatmap.png cluster_HTL_heatmap.png cluster_LTH_heatmap.png
cluster_HTL.txt		 cluster_LHL.txt	 heatmap.png


In [10]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/LHL_merged.txt

# Won't merge MIX or HLH modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [11]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/cluster_*txt

 11 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/cluster_HLH.txt
 109 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/cluster_HTL.txt
 9 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/cluster_HTL2.txt
 99 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/cluster_LHL.txt
 33 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/cluster_LTH.txt
 261 total


In [12]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/*merged.txt

 10 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/HLH_merged.txt
 116 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/HTL_merged.txt
 98 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/LHL_merged.txt
 32 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/LTH_merged.txt
 256 total


Looks good! We can move on.

## Crab C

In [13]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/

bar_5CtsPerCrab		 cluster_HTL2_heatmap.png cluster_LTH2.txt
cluster_HLH.txt		 cluster_HTL_heatmap.png cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png cluster_LHL.txt	 cluster_LTH_heatmap.png
cluster_HTL.txt		 cluster_LHL_heatmap.png heatmap.png
cluster_HTL2.txt	 cluster_LTH.txt


In [14]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [15]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/cluster_*txt

 52 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/cluster_HLH.txt
 22 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/cluster_HTL.txt
 12 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/cluster_HTL2.txt
 53 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/cluster_LHL.txt
 82 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/cluster_LTH.txt
 35 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/cluster_LTH2.txt
 256 total


In [16]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/*merged.txt

 51 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/HLH_merged.txt
 32 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/HTL_merged.txt
 52 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/LHL_merged.txt
 115 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/LTH_merged.txt
 250 total


Looks good! We can move on.

## Crab D

We'll now start on merging all modules for Crab D

Let's take another look at the current modules for Crab F

In [17]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/

bar_5CtsPerCrab		 cluster_HTL_heatmap.png cluster_LHL_heatmap.png
cluster_HLH.txt		 cluster_LHL.txt	 cluster_LTH.txt
cluster_HLH_heatmap.png cluster_LHL2.txt	 cluster_LTH_heatmap.png
cluster_HTL.txt		 cluster_LHL2_heatmap.png heatmap.png


In [18]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules

# Merge all HLH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/HLH_merged.txt

# Merge all HTL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/LTH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [19]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/cluster_*txt

 26 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/cluster_HLH.txt
 25 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/cluster_HTL.txt
 28 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/cluster_LHL.txt
 16 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/cluster_LHL2.txt
 56 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/cluster_LTH.txt
 151 total


In [20]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/*merged.txt

 25 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/HLH_merged.txt
 24 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/HTL_merged.txt
 42 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/LHL_merged.txt
 55 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_D/merged_modules/LTH_merged.txt
 146 total


Looks good! We can move on.

## Crab E

In [21]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/

bar_5CtsPerCrab		 cluster_LHL.txt	 cluster_LTH2.txt
cluster_HTL.txt		 cluster_LHL2.txt	 cluster_LTH2_heatmap.png
cluster_HTL2.txt	 cluster_LHL2_heatmap.png cluster_LTH_heatmap.png
cluster_HTL2_heatmap.png cluster_LHL_heatmap.png heatmap.png
cluster_HTL_heatmap.png cluster_LTH.txt


In [22]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/LTH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/LHL_merged.txt

# Won't merge MIX or HLH modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [23]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/cluster_*txt

 46 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/cluster_HTL.txt
 15 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/cluster_HTL2.txt
 11 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/cluster_LHL.txt
 38 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/cluster_LHL2.txt
 64 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/cluster_LTH.txt
 21 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/cluster_LTH2.txt
 195 total


In [24]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/*merged.txt

 59 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/HTL_merged.txt
 47 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/LHL_merged.txt
 83 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_E/merged_modules/LTH_merged.txt
 189 total


Looks good! We can move on.

## Crab F

We'll now start on merging all modules for Crab F

Let's take another look at the current modules for Crab F

In [25]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/

bar_5CtsPerCrab		 cluster_HTL2_heatmap.png cluster_LTH2.txt
cluster_HLH.txt		 cluster_HTL_heatmap.png cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png cluster_LHL.txt	 cluster_LTH_heatmap.png
cluster_HTL.txt		 cluster_LHL_heatmap.png heatmap.png
cluster_HTL2.txt	 cluster_LTH.txt


In [26]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules

# Merge all HLH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/HLH_merged.txt

# Merge all HTL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/LTH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/LHL_merged.txt

# Won't merge HLH or MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [27]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/cluster_*txt

 6 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/cluster_HLH.txt
 40 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/cluster_HTL.txt
 16 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/cluster_HTL2.txt
 12 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/cluster_LHL.txt
 62 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/cluster_LTH.txt
 6 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/cluster_LTH2.txt
 142 total


In [28]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/*merged.txt

 5 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/HLH_merged.txt
 54 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/HTL_merged.txt
 11 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/LHL_merged.txt
 66 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_F/merged_modules/LTH_merged.txt
 136 total


Looks good! We can move on.

## Crab G

In [29]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/

bar_5CtsPerCrab		cluster_MIX2.txt	 cluster_MIX_heatmap.png
cluster_HL.txt		cluster_MIX2_heatmap.png heatmap.png
cluster_HL_heatmap.png	cluster_MIX3.txt
cluster_MIX.txt		cluster_MIX3_heatmap.png


In [30]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/merged_modules

# Merge all HL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G -maxdepth 1 -name cluster_HL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/merged_modules/HL_merged.txt

# Merge all MIX modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/merged_modules/MIX_merged.txt

# Won't merge HH, LH,or LL modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [31]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/cluster_*txt

 3 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/cluster_HL.txt
 145 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/cluster_MIX.txt
 60 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/cluster_MIX2.txt
 8 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/cluster_MIX3.txt
 216 total


In [32]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/merged_modules/*merged.txt

 2 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/merged_modules/HL_merged.txt
 210 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_G/merged_modules/MIX_merged.txt
 212 total


Looks good! We can move on.

## Crab H

In [33]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/

bar_5CtsPerCrab		 cluster_MIX3.txt	 cluster_MIX_heatmap.png
cluster_MIX.txt		 cluster_MIX3_heatmap.png heatmap.png
cluster_MIX2.txt	 cluster_MIX4.txt
cluster_MIX2_heatmap.png cluster_MIX4_heatmap.png


In [34]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/merged_modules

# Merge all MIX modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/merged_modules/MIX_merged.txt

# Won't merge any other modules, as only MIX are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [35]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/cluster_*txt

 88 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/cluster_MIX.txt
 35 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/cluster_MIX2.txt
 32 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/cluster_MIX3.txt
 10 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/cluster_MIX4.txt
 165 total


In [36]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/merged_modules/*merged.txt

161 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_H/merged_modules/MIX_merged.txt


Looks good! We can move on.

## Crab I

In [37]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/

bar_5CtsPerCrab		cluster_MIX2.txt	 cluster_MIX_heatmap.png
cluster_LH.txt		cluster_MIX2_heatmap.png heatmap.png
cluster_LH_heatmap.png	cluster_MIX4.txt
cluster_MIX.txt		cluster_MIX4_heatmap.png


In [38]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/merged_modules

# Merge all LH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I -maxdepth 1 -name cluster_LH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/merged_modules/LH_merged.txt

# Merge all MIX modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/merged_modules/MIX_merged.txt

# Won't merge HH or HL modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [39]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/cluster_*txt

 20 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/cluster_LH.txt
 71 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/cluster_MIX.txt
 39 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/cluster_MIX2.txt
 19 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/cluster_MIX4.txt
 149 total


In [40]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/merged_modules/*merged.txt

 19 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/merged_modules/LH_merged.txt
 126 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_I/merged_modules/MIX_merged.txt
 145 total


Looks good! We can move on.

## Done merging

Now, let's get a count of the number of lines in each module in each crab

## Line Counts of Modules

In [41]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_*/merged_modules/*merged.txt

 70 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/HTL_merged.txt
 72 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/LHL_merged.txt
 88 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_A/merged_modules/LTH_merged.txt
 10 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/HLH_merged.txt
 116 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/HTL_merged.txt
 98 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/LHL_merged.txt
 32 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_B/merged_modules/LTH_merged.txt
 51 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/HLH_merged.txt
 32 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C/merged_modules/HTL_merged.txt
 52 ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_C

We'll now write the above word counts to a file, which we'll then turn into a table using R

In [42]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/Crab_*/merged_modules/*merged.txt > ../output/manual_clustering/hemat_transcriptomev1.6/all_genes/merged_modules_raw_counts.txt