--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left date: 2015-04-13 22:03:41+00:00 layout: post slug: sequence-data-c-gigas-oa-larvae-bs-seq-demultiplexed title: Sequence Data - C.gigas OA Larvae BS-Seq Demultiplexed categories: - 2015 - Crassostrea gigas larvae OA (2011) bisulfite sequencing tags: - BS-seq - Crassostrea gigas - larvae - NGS sequencing - OA - ocean acidification - Pacific oyster --- I had previously contacted Doug Turnbull at the [Univ. of Oregon Genomics Core Facility](https://gcf.uoregon.edu:8080/) for help demultiplexing this data, as it was initially returned to us as a single data set with "no index" (i.e. barcode) set for any of [the libraries that were sequenced](https://robertslab.github.io/sams-notebook/posts/2015/2015-03-20-sequencing-data-c-gigas-larvae-oa/). As it turns out, when multiplexed libraries are sequenced using the Illumina platform, an index read step needs to be "enabled" on the machine for sequencing. Otherwise, the machine does not perform the index read step (since it wouldn't be necessary for a single library). Surprisingly, the sample submission form for the [Univ. of Oregon Genomics Core Facility](http://gcf.uoregon.edu:8080/)  doesn't request any information regarding whether or not a submitted sample has been multiplexed. However, by default, they enable the index read step on all sequencing runs. I provided them with the barcodes and they demultiplexed them after the fact. I downloaded the new, demultiplexed files to [Owl/nightingales/C_gigas](https://owl.fish.washington.edu/nightingales/C_gigas/): lane2_CTTGTA_L002_R1_001.fastq.gz lane2_CTTGTA_L002_R1_002.fastq.gz lane2_CTTGTA_L002_R1_003.fastq.gz lane2_CTTGTA_L002_R1_004.fastq.gz lane2_GCCAAT_L002_R1_001.fastq.gz lane2_GCCAAT_L002_R1_002.fastq.gz lane2_GCCAAT_L002_R1_003.fastq.gz lane2_GCCAAT_L002_R1_004.fastq.gz lane2_GCCAAT_L002_R1_005.fastq.gz lane2_GCCAAT_L002_R1_006.fastq.gz Notice that the file names now contain the corresponding index! Renamed the files, to append the order number to the beginning of the file names: `$for file in lane2*; do mv "$file" "2212_$file"; done` New file names: 2212_lane2_CTTGTA_L002_R1_001.fastq.gz 2212_lane2_CTTGTA_L002_R1_002.fastq.gz 2212_lane2_CTTGTA_L002_R1_003.fastq.gz 2212_lane2_CTTGTA_L002_R1_004.fastq.gz 2212_lane2_GCCAAT_L002_R1_001.fastq.gz 2212_lane2_GCCAAT_L002_R1_002.fastq.gz 2212_lane2_GCCAAT_L002_R1_003.fastq.gz 2212_lane2_GCCAAT_L002_R1_004.fastq.gz 2212_lane2_GCCAAT_L002_R1_005.fastq.gz 2212_lane2_GCCAAT_L002_R1_006.fastq.gz Updated the [checksums.md5](https://owl.fish.washington.edu/nightingales/C_gigas/checksums.md5) file to include the new files (the command is written to exclude the previously downloaded files that are named "2212_lane2_NoIndex_"; the [^N] regex excludes any files that have a capital 'N' at that position in the file name): `$for file in 2212_lane2_[^N]*; do md5 "$file" >> checksums.md5; done` Updated the [readme.md](https://owl.fish.washington.edu/nightingales/C_gigas/readme.md) file to reflect the addition of these new files.