**Assembly**

De novo assembly resulted in a total of 154,407 transcripts (100,157 components) with an average contig length of 659bp and an N50 value of 1014 bp (Fig ) and a GC content of 36.97% *(needed?)*. The corresponding proteome was predicted to contain 35,951 proteins. However, only 19,652 *(check with annotation results later, see issue Data difference)*

<img src="https://github.com/mdelrio1/mdelrio-panopea1/blob/master/img/seq-distribution-frequency.png"/ width = 50%>

Fig. Frequency distribution of assembled contigs from *Panopea generosa* 

17,288 proteins annot SP




```
/Users/gilesg/compile/trinityrnaseq_r20131110/util/TrinityStats.pl /Volumes/web-1/cnidarian/Geo-Trinity2/trinity_out_dir/Trinity.fasta


################################
## Counts of transcripts, etc.
################################
Total trinity transcripts:	154407
Total trinity components:	100157
Percent GC: 36.97

########################################
Stats based on ALL transcript contigs:
########################################

	Contig N10: 3473
	Contig N20: 2393
	Contig N30: 1771
	Contig N40: 1344
	Contig N50: 1014

	Median contig length: 371
	Average contig: 659.53
	Total assembled bases: 101836734


#####################################################
## Stats based on ONLY LONGEST ISOFORM per COMPONENT:
#####################################################

	Contig N10: 3002
	Contig N20: 2031
	Contig N30: 1461
	Contig N40: 1067
	Contig N50: 768

	Median contig length: 320
	Average contig: 553.99
	Total assembled bases: 55486238
	```