Use bedtools to see where DMLs and MACAU loci are located.

DMLs between the Olympia oyster populations, Hood Canal and South Sound, were identified using MethylKit. File is: analyses/dml25.bed

MACAU was used to identify loci at which methylation is associated with a phenotype, in our case shell length, while controlling for relatedness.

In [1]:
pwd
Out[1]:
'/Users/laura/Documents/roberts-lab/paper-oly-mbdbs-gen/code'

Make directory for BED output

In [2]:
mkdir ../analyses/BEDtools/

Preview DML and MACAU loci bed files

In [2]:
DML = "../analyses/dml25.bed"
!head {DML}
!wc -l {DML}
Contig102998	2220	2222	26
Contig104531	8145	8147	-37
Contig109515	3377	3379	54
Contig1104	15920	15922	29
Contig128059	154	156	-27
Contig129435	3172	3174	-26
Contig1297	49910	49912	25
Contig131260	1798	1800	-29
Contig132309	816	818	28
Contig13829	2520	2522	25
      51 ../analyses/dml25.bed
In [3]:
macau75 = "../analyses/macau/macau.sign.perc.meth.10x75.bed"
!head {macau75}
!wc -l {macau75}
Contig1059	31143	31143
Contig10592	247	247
Contig107703	2906	2906
Contig109522	3554	3554
Contig118194	2767	2767
Contig123099	987	987
Contig131341	6320	6320
Contig132	51449	51449
Contig148559	191	191
Contig160335	582	582
      72 ../analyses/macau/macau.sign.perc.meth.10x75.bed

Set file paths for feature files

Olurida_v081-20190709.gene.gff - genes
Olurida_v081-20190709.CDS.gff - Coding regions of genes
Olurida_v081-20190709.exon.gff - Exons
Olurida_v081-20190709.mRNA.gff - mRNA
Olurida_v081_TE-Cg.gff - Transposable elements
20190709-Olurida_v081.stringtie.gtf - alternative splice variants

Note: may also have an intron track, Steven was working on that. Could also try to get new 3' and 5' UTR tracks.

In [10]:
gene = "../genome-features/Olurida_v081-20190709.gene.gff"
CDS = "../genome-features/Olurida_v081-20190709.CDS.gff"
exon = "../genome-features/Olurida_v081-20190709.exon.gff"
mRNA = "../genome-features/Olurida_v081-20190709.mRNA.gff"
TE = "../genome-features/Olurida_v081_TE-Cg.gff"
ASV = "../genome-features/20190709-Olurida_v081.stringtie.gtf"
AllLoci = "../analyses/macau/macau-all-loci.bed"
AllLoci10x75 = "../analyses/macau/macau-all-loci.10x75.bed"
In [11]:
! bedtools intersect \
Tool:    bedtools intersect (aka intersectBed)
Version: v2.29.0
Summary: Report overlaps between two feature files.

Usage:   bedtools intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam>

	Note: -b may be followed with multiple databases and/or 
	wildcard (*) character(s). 
Options: 
	-wa	Write the original entry in A for each overlap.

	-wb	Write the original entry in B for each overlap.
		- Useful for knowing _what_ A overlaps. Restricted by -f and -r.

	-loj	Perform a "left outer join". That is, for each feature in A
		report each overlap with B.  If no overlaps are found, 
		report a NULL feature for B.

	-wo	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlaps restricted by -f and -r.
		  Only A features with overlap are reported.

	-wao	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlapping features restricted by -f and -r.
		  However, A features w/o overlap are also reported
		  with a NULL B feature and overlap = 0.

	-u	Write the original A entry _once_ if _any_ overlaps found in B.
		- In other words, just report the fact >=1 hit was found.
		- Overlaps restricted by -f and -r.

	-c	For each entry in A, report the number of overlaps with B.
		- Reports 0 for A entries that have no overlap with B.
		- Overlaps restricted by -f, -F, -r, and -s.

	-C	For each entry in A, separately report the number of
		- overlaps with each B file on a distinct line.
		- Reports 0 for A entries that have no overlap with B.
		- Overlaps restricted by -f, -F, -r, and -s.

	-v	Only report those entries in A that have _no overlaps_ with B.
		- Similar to "grep -v" (an homage).

	-ubam	Write uncompressed BAM output. Default writes compressed BAM.

	-s	Require same strandedness.  That is, only report hits in B
		that overlap A on the _same_ strand.
		- By default, overlaps are reported without respect to strand.

	-S	Require different strandedness.  That is, only report hits in B
		that overlap A on the _opposite_ strand.
		- By default, overlaps are reported without respect to strand.

	-f	Minimum overlap required as a fraction of A.
		- Default is 1E-9 (i.e., 1bp).
		- FLOAT (e.g. 0.50)

	-F	Minimum overlap required as a fraction of B.
		- Default is 1E-9 (i.e., 1bp).
		- FLOAT (e.g. 0.50)

	-r	Require that the fraction overlap be reciprocal for A AND B.
		- In other words, if -f is 0.90 and -r is used, this requires
		  that B overlap 90% of A and A _also_ overlaps 90% of B.

	-e	Require that the minimum fraction be satisfied for A OR B.
		- In other words, if -e is used with -f 0.90 and -F 0.10 this requires
		  that either 90% of A is covered OR 10% of  B is covered.
		  Without -e, both fractions would have to be satisfied.

	-split	Treat "split" BAM or BED12 entries as distinct BED intervals.

	-g	Provide a genome file to enforce consistent chromosome sort order
		across input files. Only applies when used with -sorted option.

	-nonamecheck	For sorted data, don't throw an error if the file has different naming conventions
			for the same chromosome. ex. "chr1" vs "chr01".

	-sorted	Use the "chromsweep" algorithm for sorted (-k1,1 -k2,2n) input.

	-names	When using multiple databases, provide an alias for each that
		will appear instead of a fileId when also printing the DB record.

	-filenames	When using multiple databases, show each complete filename
			instead of a fileId when also printing the DB record.

	-sortout	When using multiple databases, sort the output DB hits
			for each record.

	-bed	If using BAM input, write output as BED.

	-header	Print the header from the A file prior to results.

	-nobuf	Disable buffered output. Using this option will cause each line
		of output to be printed as it is generated, rather than saved
		in a buffer. This will make printing large output files 
		noticeably slower, but can be useful in conjunction with
		other software tools and scripts that need to process one
		line of bedtools output at a time.

	-iobuf	Specify amount of memory to use for input buffer.
		Takes an integer argument. Optional suffixes K/M/G supported.
		Note: currently has no effect with compressed files.

Notes: 
	(1) When a BAM file is used for the A file, the alignment is retained if overlaps exist,
	and excluded if an overlap cannot be found.  If multiple overlaps exist, they are not
	reported, as we are only testing for one or more overlaps.




***** ERROR: No input file given. Exiting. *****

Bedtool options to use:
-u - Write the original A entry once if any overlaps found in B, i.e. just report the fact >=1 hit was found
-a - File A
-b - File B

1. DMLs

In [18]:
! echo "Total methylated loci:" 
! cat {AllLoci} | wc -l

! echo "Loci differentially methylated between SS and HC populations:"
! cat {DML} | wc -l 

!echo "Loci that overlap with genes:"
! bedtools intersect \
-u \
-a {DML} \
-b {gene} | wc -l

!echo "Loci that overlap with exons:"
! bedtools intersect \
-u \
-a {DML} \
-b {exon} | wc -l

!echo "Loci that overlap with coding sequences:"
! bedtools intersect \
-u \
-a {DML} \
-b {CDS} | wc -l

!echo "Loci that overlap with mRNA:"
! bedtools intersect \
-u \
-a {DML} \
-b {mRNA} | wc -l

!echo "Loci that overlap with transposable elements:"
! bedtools intersect \
-u \
-a {DML} \
-b {TE} | wc -l

!echo "Loci that overlap with alternative splice variants:"
! bedtools intersect \
-u \
-a {DML} \
-b {ASV} | wc -l

!echo "Loci that do not overlap with known features:"
! bedtools intersect \
-v \
-a {DML} \
-b {gene} {exon} {CDS} {mRNA} {TE} {ASV} | wc -l
Total methylated loci:
  256043
Loci differentially methylated between SS and HC populations:
      51
Loci that overlap with genes:
      31
Loci that overlap with exons:
      27
Loci that overlap with coding sequences:
      25
Loci that overlap with mRNA:
      31
Loci that overlap with transposable elements:
       3
Loci that overlap with alternative splice variants:
      40
Loci that do not overlap with known features:
       9

Save DML lists to file

In [14]:
! bedtools intersect -wb -a {DML} -b {gene} >  ../analyses/BEDtools/DML-gene.txt
! bedtools intersect -wb -a {DML} -b {exon} >  ../analyses/BEDtools/DML-exon.txt
! bedtools intersect -wb -a {DML} -b {CDS} >  ../analyses/BEDtools/DML-CDS.txt
! bedtools intersect -wb -a {DML} -b {mRNA} >  ../analyses/BEDtools/DML-mRNA.txt
! bedtools intersect -wb -a {DML} -b {TE} >  ../analyses/BEDtools/DML-TE.txt
! bedtools intersect -wb -a {DML} -b {ASV} >  ../analyses/BEDtools/DML-ASV.txt
! bedtools intersect -v -a {DML} -b {gene} {exon} {CDS} {mRNA} {TE} {ASV} >  ../analyses/BEDtools/DML-intragenic.txt

Save background loci feature lists to files

In [15]:
! echo "genes" 
! bedtools intersect -u -a {AllLoci} -b {gene} | wc -l
! echo "exon" 
! bedtools intersect -u -a {AllLoci} -b {exon} | wc -l
! echo "CDS" 
! bedtools intersect -u -a {AllLoci} -b {CDS} | wc -l
! echo "mRNA"
! bedtools intersect -u -a {AllLoci} -b {mRNA} | wc -l
! echo "TE" 
! bedtools intersect -u -a {AllLoci} -b {TE} | wc -l
! echo "ASV" 
! bedtools intersect -u -a {AllLoci} -b {ASV} | wc -l
! echo "intragenic" 
! bedtools intersect -v -a {AllLoci} -b {gene} {exon} {CDS} {mRNA} {TE} {ASV} | wc -l
genes
  123087
exon
   93846
CDS
   89367
mRNA
  123087
TE
   15510
ASV
  199852
intragenic
   48414
In [16]:
! bedtools intersect -wb -a {AllLoci} -b {gene} >  ../analyses/BEDtools/AllLoci-gene.txt
! bedtools intersect -wb -a {AllLoci} -b {exon} >  ../analyses/BEDtools/AllLoci-exon.txt
! bedtools intersect -wb -a {AllLoci} -b {CDS} >  ../analyses/BEDtools/AllLoci-CDS.txt
! bedtools intersect -wb -a {AllLoci} -b {mRNA} >  ../analyses/BEDtools/AllLoci-mRNA.txt
! bedtools intersect -wb -a {AllLoci} -b {TE} >  ../analyses/BEDtools/AllLoci-TE.txt
! bedtools intersect -wb -a {AllLoci} -b {ASV} >  ../analyses/BEDtools/AllLoci-ASV.txt
! bedtools intersect -v -a {AllLoci} -b {gene} {exon} {CDS} {mRNA} {TE} {ASV} >  ../analyses/BEDtools/AllLoci-intragenic.txt

2. MACAU Loci

In [19]:
! echo "Total methylated loci (10x across 75% of samples):" 
! cat ../analyses/macau/macau-all-loci.10x75.bed | wc -l

! echo "Loci associated with shell length (MACAU):"
! cat {macau75} | wc -l 

!echo "Loci that overlap with genes:"
! bedtools intersect \
-u \
-a {macau75} \
-b {gene} | wc -l

!echo "Loci that overlap with exons:"
! bedtools intersect \
-u \
-a {macau75} \
-b {exon} | wc -l

!echo "Loci that overlap with coding sequences:"
! bedtools intersect \
-u \
-a {macau75} \
-b {CDS} | wc -l

!echo "Loci that overlap with mRNA:"
! bedtools intersect \
-u \
-a {macau75} \
-b {mRNA} | wc -l

!echo "Loci that overlap with transposable elements:"
! bedtools intersect \
-u \
-a {macau75} \
-b {TE} | wc -l

!echo "Loci that overlap with alternative splice variants:"
! bedtools intersect \
-u \
-a {macau75} \
-b {ASV} | wc -l

!echo "Loci that do not overlap with known features:"
! bedtools intersect \
-v \
-a {macau75} \
-b {gene} {exon} {CDS} {mRNA} {TE} {ASV} | wc -l
Total methylated loci (10x across 75% of samples):
  108490
Loci associated with shell length (MACAU):
      72
Loci that overlap with genes:
      34
Loci that overlap with exons:
      33
Loci that overlap with coding sequences:
      32
Loci that overlap with mRNA:
      34
Loci that overlap with transposable elements:
       0
Loci that overlap with alternative splice variants:
      60
Loci that do not overlap with known features:
      12

Save macau lists to file

In [20]:
! bedtools intersect -wb -a {macau75} -b {gene} >  ../analyses/BEDtools/macau75-gene.txt
! bedtools intersect -wb -a {macau75} -b {exon} >  ../analyses/BEDtools/macau75-exon.txt
! bedtools intersect -wb -a {macau75} -b {CDS} >  ../analyses/BEDtools/macau75-CDS.txt
! bedtools intersect -wb -a {macau75} -b {mRNA} >  ../analyses/BEDtools/macau75-mRNA.txt
! bedtools intersect -wb -a {macau75} -b {TE} >  ../analyses/BEDtools/macau75-TE.txt
! bedtools intersect -wb -a {macau75} -b {ASV} >  ../analyses/BEDtools/macau75-ASV.txt
! bedtools intersect -v -a {macau75} -b {gene} {exon} {CDS} {mRNA} {TE} {ASV} >  ../analyses/BEDtools/macau75-intragenic.txt

Prepare blastx annotation files to merge with DML and MACAU results

The actual merging will occur in a later RStudio notebook

In [6]:
! curl https://raw.githubusercontent.com/sr320/paper-oly-mbdbs-gen/master/analyses/Olgene_blastx_uniprot.05.tab \
    > ../data/Olgene_blastx_uniprot.05.tab
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1037k  100 1037k    0     0  1383k      0 --:--:-- --:--:-- --:--:-- 1381k
In [7]:
#convert pipes to tab
!tr '|' '\t' < ../data/Olgene_blastx_uniprot.05.tab \
> ../data/Olgene_blastx_uniprot.05-20191122.tab
In [9]:
#Reduce the number of columns using awk. Sort, and save as a new file.
!awk -v OFS='\t' '{print $1, $3, $13}' \
< ../data/Olgene_blastx_uniprot.05-20191122.tab | sort \
> ../data/Olgene_blastx_uniprot.05-20191122-sort.tab
In [10]:
! head ../data/Olgene_blastx_uniprot.05-20191122-sort.tab
Contig100018:1232-2375	P31695	2.23e-06
Contig100073:8284-10076	H2A0L8	6.98e-24
Contig100101:2158-2821	O35796	3.67e-28
Contig100107:1089-2009	Q2KMM2	8.78e-15
Contig100114:437-2094	Q9V4M2	1.41e-09
Contig100163:2402-6678	P23708	2.55e-18
Contig100166:542-2054	G5EBR3	2.08e-11
Contig100170:472-1350	Q5F3T9	9.14e-42
Contig100188:460-2761	Q8TD26	1.35e-18
Contig100206:5719-12338	Q2HJH1	1.51e-14
In [70]:
#Uniprot codes have ".1" appended, so those need to be removed.
#Isolate the contig column name with cut
#Flip order of characters with rev
#Delete last three characters with cut -c
#Flip order of characters with rev
#Add information as a new column to annotated table with paste

!cut -f1 temporary/olurida-blast-sort.tab \
| rev \
| cut -c 3- \
| rev \
> temporary/olurida-blast-sort2.tab
In [20]:
!curl http://owl.fish.washington.edu/halfshell/bu-alanine-wd/17-07-20/uniprot-SP-GO.sorted \
    > ../data/uniprot-SP-GO-sorted.tab
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  340M  100  340M    0     0  2083k      0  0:02:47  0:02:47 --:--:-- 2187k 0  2154k      0  0:02:41  0:00:16  0:02:25 2175k0  1988k      0  0:02:55  0:00:36  0:02:19 1911k0  0:02:59  0:00:57  0:02:02 2190k79M    0     0  2067k      0  0:02:48  0:02:18  0:00:30 2206k
In [21]:
! head ../data/uniprot-SP-GO-sorted.tab
A0A023GPI8	LECA_CANBL	reviewed	Lectin alpha chain (CboL) [Cleaved into: Lectin beta chain; Lectin gamma chain]		Canavalia boliviana	237			mannose binding [GO:0005537]; metal ion binding [GO:0046872]	mannose binding [GO:0005537]; metal ion binding [GO:0046872]	GO:0005537; GO:0046872
A0A023GPJ0	CDII_ENTCC	reviewed	Immunity protein CdiI	cdiI ECL_04450.1	Enterobacter cloacae subsp. cloacae (strain ATCC 13047 / DSM 30054 / NBRC 13535 / NCDC 279-56)	145					
A0A023PXA5	YA19A_YEAST	reviewed	Putative uncharacterized protein YAL019W-A	YAL019W-A	Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)	189					
A0A023PXB0	YA019_YEAST	reviewed	Putative uncharacterized protein YAR019W-A	YAR019W-A	Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)	110					
A0A023PXB5	IRC2_YEAST	reviewed	Putative uncharacterized membrane protein IRC2 (Increased recombination centers protein 2)	IRC2 YDR112W	Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)	102		integral component of membrane [GO:0016021]		integral component of membrane [GO:0016021]	GO:0016021
A0A023PXB9	YD99W_YEAST	reviewed	Putative uncharacterized membrane protein YDR199W	YDR199W	Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)	121		integral component of membrane [GO:0016021]		integral component of membrane [GO:0016021]	GO:0016021
A0A023PXC2	YE53A_YEAST	reviewed	Putative uncharacterized membrane protein YEL053W-A	YEL053W-A	Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)	115		integral component of membrane [GO:0016021]		integral component of membrane [GO:0016021]	GO:0016021
A0A023PXC7	YE068_YEAST	reviewed	Putative uncharacterized membrane protein YER068C-A	YER068C-A	Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)	143		integral component of membrane [GO:0016021]		integral component of membrane [GO:0016021]	GO:0016021
A0A023PXD3	YE88A_YEAST	reviewed	Putative uncharacterized protein YER088C-A	YER088C-A	Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)	107					
A0A023PXD5	YE147_YEAST	reviewed	Putative uncharacterized membrane protein YER147C-A	YER147C-A	Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)	136		integral component of membrane [GO:0016021]		integral component of membrane [GO:0016021]	GO:0016021

Join the first column in the first file with the first column in the second file

The files are tab delimited, and the output should also be tab delimited (-t $'\t')

In [22]:
! join -1 2 -2 1 -t $'\t' \
../data/Olgene_blastx_uniprot.05-20191122-sort.tab \
../data/uniprot-SP-GO-sorted.tab \
> ../data/Oly_blastx_uniprot.tab
In [18]:
! head ../data/Oly_blastx_uniprot.tab
P31695	Contig100018:1232-2375	2.23e-06	NOTC4_MOUSE	reviewed	Neurogenic locus notch homolog protein 4 (Notch 4) [Cleaved into: Transforming protein Int-3; Notch 4 extracellular truncation; Notch 4 intracellular domain]	Notch4 Int-3 Int3	Mus musculus (Mouse)	1964	branching involved in blood vessel morphogenesis [GO:0001569]; cell differentiation [GO:0030154]; embryo development [GO:0009790]; mammary gland development [GO:0030879]; morphogenesis of a branching structure [GO:0001763]; negative regulation of endothelial cell differentiation [GO:0045602]; negative regulation of Notch signaling pathway [GO:0045746]; Notch signaling pathway [GO:0007219]; positive regulation of angiogenesis [GO:0045766]; positive regulation of aorta morphogenesis [GO:1903849]; regulation of Notch signaling pathway [GO:0008593]; regulation of protein localization [GO:0032880]; regulation of protein processing [GO:0070613]; regulation of transcription, DNA-templated [GO:0006355]; transcription, DNA-templated [GO:0006351]; venous blood vessel morphogenesis [GO:0048845]	cell surface [GO:0009986]; cytoplasmic vesicle [GO:0031410]; endoplasmic reticulum [GO:0005783]; integral component of membrane [GO:0016021]; nucleus [GO:0005634]; plasma membrane [GO:0005886]	calcium ion binding [GO:0005509]; Notch binding [GO:0005112]; receptor activity [GO:0004872]	cell surface [GO:0009986]; cytoplasmic vesicle [GO:0031410]; endoplasmic reticulum [GO:0005783]; integral component of membrane [GO:0016021]; nucleus [GO:0005634]; plasma membrane [GO:0005886]; calcium ion binding [GO:0005509]; Notch binding [GO:0005112]; receptor activity [GO:0004872]; branching involved in blood vessel morphogenesis [GO:0001569]; cell differentiation [GO:0030154]; embryo development [GO:0009790]; mammary gland development [GO:0030879]; morphogenesis of a branching structure [GO:0001763]; negative regulation of endothelial cell differentiation [GO:0045602]; negative regulation of Notch signaling pathway [GO:0045746]; Notch signaling pathway [GO:0007219]; positive regulation of angiogenesis [GO:0045766]; positive regulation of aorta morphogenesis [GO:1903849]; regulation of Notch signaling pathway [GO:0008593]; regulation of protein localization [GO:0032880]; regulation of protein processing [GO:0070613]; regulation of transcription, DNA-templated [GO:0006355]; transcription, DNA-templated [GO:0006351]; venous blood vessel morphogenesis [GO:0048845]	GO:0001569; GO:0001763; GO:0004872; GO:0005112; GO:0005509; GO:0005634; GO:0005783; GO:0005886; GO:0006351; GO:0006355; GO:0007219; GO:0008593; GO:0009790; GO:0009986; GO:0016021; GO:0030154; GO:0030879; GO:0031410; GO:0032880; GO:0045602; GO:0045746; GO:0045766; GO:0048845; GO:0070613; GO:1903849
Q2KMM2	Contig100107:1089-2009	8.78e-15	TPPC1_RAT	reviewed	Trafficking protein particle complex subunit 1	Trappc1	Rattus norvegicus (Rat)	145	ER to Golgi vesicle-mediated transport [GO:0006888]	endoplasmic reticulum [GO:0005783]; Golgi apparatus [GO:0005794]; TRAPP complex [GO:0030008]	Rab guanyl-nucleotide exchange factor activity [GO:0017112]	endoplasmic reticulum [GO:0005783]; Golgi apparatus [GO:0005794]; TRAPP complex [GO:0030008]; Rab guanyl-nucleotide exchange factor activity [GO:0017112]; ER to Golgi vesicle-mediated transport [GO:0006888]	GO:0005783; GO:0005794; GO:0006888; GO:0017112; GO:0030008
Q9V4M2	Contig100114:437-2094	1.41e-09	WECH_DROME	reviewed	Protein wech (Protein dappled)	wech dpld CG42396	Drosophila melanogaster (Fruit fly)	832	cell differentiation [GO:0030154]; instar larval development [GO:0002168]; muscle attachment [GO:0016203]; regulation of cell-cell adhesion mediated by integrin [GO:0033632]	intracellular [GO:0005622]; muscle tendon junction [GO:0005927]	protein binding, bridging [GO:0030674]; zinc ion binding [GO:0008270]	intracellular [GO:0005622]; muscle tendon junction [GO:0005927]; protein binding, bridging [GO:0030674]; zinc ion binding [GO:0008270]; cell differentiation [GO:0030154]; instar larval development [GO:0002168]; muscle attachment [GO:0016203]; regulation of cell-cell adhesion mediated by integrin [GO:0033632]	GO:0002168; GO:0005622; GO:0005927; GO:0008270; GO:0016203; GO:0030154; GO:0030674; GO:0033632
Q9Y6R7	Contig100513:1058-5433	6.07e-15	FCGBP_HUMAN	reviewed	IgGFc-binding protein (Fcgamma-binding protein antigen) (FcgammaBP)	FCGBP	Homo sapiens (Human)	5405		extracellular exosome [GO:0070062]		extracellular exosome [GO:0070062]	GO:0070062
Q9Z139	Contig105775:220-817	2.17e-08	ROR1_MOUSE	reviewed	Inactive tyrosine-protein kinase transmembrane receptor ROR1 (mROR1) (Neurotrophic tyrosine kinase, receptor-related 1)	Ror1 Ntrkr1	Mus musculus (Mouse)	937	astrocyte development [GO:0014002]; transmembrane receptor protein tyrosine kinase signaling pathway [GO:0007169]; Wnt signaling pathway [GO:0016055]	cell surface [GO:0009986]; cytoplasm [GO:0005737]; integral component of plasma membrane [GO:0005887]; receptor complex [GO:0043235]; stress fiber [GO:0001725]	ATP binding [GO:0005524]; transmembrane receptor protein tyrosine kinase activity [GO:0004714]; Wnt-protein binding [GO:0017147]	cell surface [GO:0009986]; cytoplasm [GO:0005737]; integral component of plasma membrane [GO:0005887]; receptor complex [GO:0043235]; stress fiber [GO:0001725]; ATP binding [GO:0005524]; transmembrane receptor protein tyrosine kinase activity [GO:0004714]; Wnt-protein binding [GO:0017147]; astrocyte development [GO:0014002]; transmembrane receptor protein tyrosine kinase signaling pathway [GO:0007169]; Wnt signaling pathway [GO:0016055]	GO:0001725; GO:0004714; GO:0005524; GO:0005737; GO:0005887; GO:0007169; GO:0009986; GO:0014002; GO:0016055; GO:0017147; GO:0043235
Q9Z1K7	Contig107685:373-631	5.34e-06	APCL_MOUSE	reviewed	Adenomatous polyposis coli protein 2	Apc2	Mus musculus (Mouse)	2274	activation of GTPase activity [GO:0090630]; microtubule cytoskeleton organization [GO:0000226]; negative regulation of canonical Wnt signaling pathway [GO:0090090]; negative regulation of catenin import into nucleus [GO:0035414]; Wnt signaling pathway [GO:0016055]	actin filament [GO:0005884]; catenin complex [GO:0016342]; cytoplasm [GO:0005737]; Golgi apparatus [GO:0005794]; lamellipodium membrane [GO:0031258]; microtubule [GO:0005874]; microtubule cytoskeleton [GO:0015630]; perinuclear region of cytoplasm [GO:0048471]	beta-catenin binding [GO:0008013]; microtubule binding [GO:0008017]	actin filament [GO:0005884]; catenin complex [GO:0016342]; cytoplasm [GO:0005737]; Golgi apparatus [GO:0005794]; lamellipodium membrane [GO:0031258]; microtubule [GO:0005874]; microtubule cytoskeleton [GO:0015630]; perinuclear region of cytoplasm [GO:0048471]; beta-catenin binding [GO:0008013]; microtubule binding [GO:0008017]; activation of GTPase activity [GO:0090630]; microtubule cytoskeleton organization [GO:0000226]; negative regulation of canonical Wnt signaling pathway [GO:0090090]; negative regulation of catenin import into nucleus [GO:0035414]; Wnt signaling pathway [GO:0016055]	GO:0000226; GO:0005737; GO:0005794; GO:0005874; GO:0005884; GO:0008013; GO:0008017; GO:0015630; GO:0016055; GO:0016342; GO:0031258; GO:0035414; GO:0048471; GO:0090090; GO:0090630
Q9Z2K0	Contig127694:1140-4202	6.26e-37	DEDD_RAT	reviewed	Death effector domain-containing protein (Death effector domain-containing testicular molecule)	Dedd Deft	Rattus norvegicus (Rat)	318	apoptotic process [GO:0006915]; decidualization [GO:0046697]; extrinsic apoptotic signaling pathway via death domain receptors [GO:0008625]; negative regulation of protein catabolic process [GO:0042177]; negative regulation of transcription of nuclear large rRNA transcript from RNA polymerase I promoter [GO:1901837]; regulation of apoptotic process [GO:0042981]; spermatogenesis [GO:0007283]; transcription, DNA-templated [GO:0006351]	cytoplasm [GO:0005737]; nucleolus [GO:0005730]	DNA binding [GO:0003677]	cytoplasm [GO:0005737]; nucleolus [GO:0005730]; DNA binding [GO:0003677]; apoptotic process [GO:0006915]; decidualization [GO:0046697]; extrinsic apoptotic signaling pathway via death domain receptors [GO:0008625]; negative regulation of protein catabolic process [GO:0042177]; negative regulation of transcription of nuclear large rRNA transcript from RNA polymerase I promoter [GO:1901837]; regulation of apoptotic process [GO:0042981]; spermatogenesis [GO:0007283]; transcription, DNA-templated [GO:0006351]	GO:0003677; GO:0005730; GO:0005737; GO:0006351; GO:0006915; GO:0007283; GO:0008625; GO:0042177; GO:0042981; GO:0046697; GO:1901837
W4VSJ0	Contig16086:14079-20602	1.14e-44	ACES_TRILK	reviewed	Acetylcholinesterase-1 (AChE) (EC 3.1.1.7)		Trittame loki (Brush-footed trapdoor spider)	559	neurotransmitter catabolic process [GO:0042135]	extracellular region [GO:0005576]	acetylcholinesterase activity [GO:0003990]	extracellular region [GO:0005576]; acetylcholinesterase activity [GO:0003990]; neurotransmitter catabolic process [GO:0042135]	GO:0003990; GO:0005576; GO:0042135
W8W138	Contig53758:3059-10395	7.87e-08	BACE_STRPU	reviewed	Beta-secretase (EC 3.4.23.-)	BACE	Strongylocentrotus purpuratus (Purple sea urchin)	540	protein catabolic process [GO:0030163]; proteolysis [GO:0006508]	integral component of membrane [GO:0016021]	aspartic-type endopeptidase activity [GO:0004190]	integral component of membrane [GO:0016021]; aspartic-type endopeptidase activity [GO:0004190]; protein catabolic process [GO:0030163]; proteolysis [GO:0006508]	GO:0004190; GO:0006508; GO:0016021; GO:0030163