--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left date: 2017-03-07 23:36:53+00:00 layout: post slug: computing-oly-bgi-gbs-reproducibility-fail title: Computing - Oly BGI GBS Reproducibility Fail categories: - 2017 - Genotype-by-sequencing at BGI tags: - BGI - GBS - genotype-by-sequencing - iTools - olympia oyster - Ostrea lurida --- Since we're preparing a manuscript that relies on BGI's manipulation/handling of the genotype-by-sequencing data, I attempted to could reproduce the demultiplexing steps that BGI used in order to perform the SNP/genotyping on these samples. The key word in the above sentence is "attempted." Ugh, what a massive waste of time it turned out to be. I've contacted BGI to get some help on this. In the meantime, here's a brief (actually, not as brief as I'd like) rundown of my struggles. The demultiplexing software that BGI used is something called "iTools" which is bundled in this GitHub repo: [Resqtools](https://github.com/BGI-shenzhen/Reseqtools) To demutliplex, they ran a script called: [split.sh](https://owl.fish.washington.edu/nightingales/O_lurida/20160223_gbs/split.sh) The script seems fairly straightforward. Here is what it contains: iTools Fqtools splitpool \ -InFq1 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz \ -InFq2 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz \ -Index index.lst \ -Flag enzyme.txt \ -MisMatch \ -OutDir split It tells the iTools program to use the Fqtools tool "splitpool" to operate on a pair of gzipped FASTQ files. It also utilizes an index file ([index.lst](https://owl.fish.washington.edu/nightingales/O_lurida/20160223_gbs/index.lst)) which contains all the barcodes needed to identify, and separate, the individual samples that were combined prior to sequencing. The first bump in the road is the `-Flag enzyme.txt` portion of the code. BGI did not provide me with this file. I recently requested them to send me it (or its contents, since I suspected it was only a single line text file). They sent me the contents of the file: CAGC CTGC The next problem is neither of those two sequences are the recognition site for the enzyme that was (supposedly) used: ApeKI. The recognition site for ApeKI is: GCWGC Regardless, I decided to see if I could reproduce the demultiplexing using the info they'd provided me. I cloned the [Resqtools](https://github.com/BGI-shenzhen/Reseqtools) repo, changed into the Reseqtools/iTools directory and typed `make`. This resulted in an error informing me that it could not find `boost/spirit/core.hpp` I tracked down the [Boost](https://www.boost.org/) library junk, downloaded the newest version and untarred it in /usr/local/bin. Tried to run `make` in the Reseqtools/iTools directory and got the same error. Realized iTools might not be searching the system `$PATH` (this turned out to be correct), so I moved the contents of the Boost folder to the iTools, ran `make` and got the same error. Turns out, the newest version of Boost doesn't have that `core.hpp` file any more. Looking at the iTools documentation, iTools was built around Boost 1.44. OMG... Downloaded Boost 1.44 and went through the same steps as above. This eliminated the missing `core.hpp` error! But, of course, led to another error. The error: "Threading support unavaliable: it has been explicitly disabled with BOOST_DISABLE_THREADS" That was related to something with newer versions of the GCC compiler (this is, essentially, built into the computer; it's not worth trying to install/use old versions of GCC) trying to work with old versions of Boost. Found a patch for a config file here: [libstdcpp3.hpp.patch](https://svn.boost.org/trac/boost/attachment/ticket/6165/libstdcpp3.hpp.patch) I made the appropriate edits to the file as shown in that link and ran `make` and it almost worked! The current error is: ./src/Variants/soapsv-v1.02/include.h:15:16: fatal error: gd.h: No such file or directory I gave up and contacted BGI to see if they can get me a functional version of iTools...