In an attempt to figure out what’s going on with the Illumina data we recently received for these samples, I BLASTed the 400ppm data set that had previously been de-novo assembled by Steven: EmmaBS400.fa.
Jupyter (IPython) Notebook : 20150501_Cgigas_larvae_OA_BLASTn_nt.ipynb
Notebook Viewer : 20150501_Cgigas_larvae_OA_BLASTn_nt
Results:
BLASTn Output File: 20150501_nt_blastn.tab
BLAST e-vals <= 0.001: 20150501_Cgigas_larvae_OA_blastn_evals_0.001.txt
Unique BLAST Species: 20150501_Cgigas_larvae_OA_unique_blastn_evals.txt
Firstly, since this library was bisulfite converted, we know that matching won’t be as robust as we’d normally see.
However, the BLAST matches for this are terrible.
Only 0.65% of the BLAST matches (e-value <0.001) are to Crassostrea gigas. Yep, you read that correctly: 0.65%.
It’s nearly 40-fold less than the top species: Dictyostelium discoideum (a slime mold)
It’s 30-fold less than the next species: Danio rerio (zebra fish)
Then it’s followed up by human and mouse.
I think I will need to contact the Univ. of Oregon sequencing facility to see what their thoughts on this data is, because it’s not even remotely close to what we should be seeing, even with the bisulfite conversion…