--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left date: 2015-04-14 05:53:30+00:00 layout: post slug: sequence-data-analysis-lsu-c-virginica-oil-spill-mbd-bs-seq-data title: Sequence Data Analysis - LSU C.virginica Oil Spill MBD BS-Seq Data categories: - 2015 - LSU C.virginica Oil Spill MBD BS Sequencing tags: - BS-seq - Crassostrea virginica - Eastern oyster - FASTQC - LSU - MBD-Seq - NGS sequencing - oil --- Performed some rudimentary data analysis on the new, demultiplexed data downloaded earlier today: 2112_lane1_ACAGTG_L001_R1_001.fastq.gz 2112_lane1_ACAGTG_L001_R1_002.fastq.gz 2112_lane1_ATCACG_L001_R1_001.fastq.gz 2112_lane1_ATCACG_L001_R1_002.fastq.gz 2112_lane1_ATCACG_L001_R1_003.fastq.gz 2112_lane1_CAGATC_L001_R1_001.fastq.gz 2112_lane1_CAGATC_L001_R1_002.fastq.gz 2112_lane1_CAGATC_L001_R1_003.fastq.gz 2112_lane1_GCCAAT_L001_R1_001.fastq.gz 2112_lane1_GCCAAT_L001_R1_002.fastq.gz 2112_lane1_TGACCA_L001_R1_001.fastq.gz 2112_lane1_TTAGGC_L001_R1_001.fastq.gz 2112_lane1_TTAGGC_L001_R1_002.fastq.gz Compared total amount of data (in gigabytes) generated from each index. The commands below send the output of the 'ls -l' command to awk. Awk sums the file sizes, found in the 5th field ($5) of the 'ls -l' command, then prints the sum, divided by 1024^3 to convert from bytes to gigabytes. Index: ACAGTG `$ls -l 2112_lane1_AC* | awk '{sum += $5} END {print sum/1024/1024/1024}' 1.49652` Index: ATCACG `$ls -l 2112_lane1_AT* | awk '{sum += $5} END {print sum/1024/1024/1024}' 3.02269` Index: CAGATC `$ls -l 2112_lane1_CA* | awk '{sum += $5} END {print sum/1024/1024/1024}' 3.49797` Index: GCCAAT `$ls -l 2112_lane1_GC* | awk '{sum += $5} END {print sum/1024/1024/1024}' 2.21379` Index: TGACCA `$ls -l 2112_lane1_TG* | awk '{sum += $5} END {print sum/1024/1024/1024}' 0.687374` Index: TTAGGC `$ls -l 2112_lane1_TT* | awk '{sum += $5} END {print sum/1024/1024/1024}' 2.28902` Ran FASTQC on the following files downloaded earlier today. The FASTQC command is below. This command runs FASTQC in a for loop over any files that begin with "2212_lane2_C" or "2212_lane2_G" and outputs the analyses to the Arabidopsis folder on Eagle: `$for file in /Volumes/nightingales/C_virginica/2112_lane1_[ATCG]*; do fastqc "$file" --outdir=/Volumes/Eagle/Arabidopsis/; done` From within the Eagle/Arabidopsis folder, I renamed the FASTQC output files to prepend today's date: `$for file in 2112_lane1_[ATCG]*; do mv "$file" "20150413_$file"; done` Then, I unzipped the .zip files generated by FASTQC in order to have access to the images, to eliminate the need for screen shots for display in this notebook entry: `$for file in 20150413_2112_lane1_[ATCG]*.zip; do unzip "$file"; done` The unzip output retained the old naming scheme, so I renamed the unzipped folders: `$for file in 2112_lane1_[ATCG]*; do mv "$file" "20150413_$file"; done` The FASTQC results are linked below: [20150413_2112_lane1_ACAGTG_L001_R1_001_fastqc.html](https://eagle.fish.washington.edu/Arabidopsis/20150413_2112_lane1_ACAGTG_L001_R1_001_fastqc.html) [20150413_2112_lane1_ACAGTG_L001_R1_002_fastqc.html](https://eagle.fish.washington.edu/Arabidopsis/20150413_2112_lane1_ACAGTG_L001_R1_002_fastqc.html) [20150413_2112_lane1_ATCACG_L001_R1_001_fastqc.html](https://eagle.fish.washington.edu/Arabidopsis/20150413_2112_lane1_ATCACG_L001_R1_001_fastqc.html) [20150413_2112_lane1_ATCACG_L001_R1_002_fastqc.html](https://eagle.fish.washington.edu/Arabidopsis/20150413_2112_lane1_ATCACG_L001_R1_002_fastqc.html) [20150413_2112_lane1_ATCACG_L001_R1_003_fastqc.html](https://eagle.fish.washington.edu/Arabidopsis/20150413_2112_lane1_ATCACG_L001_R1_003_fastqc.html) [20150413_2112_lane1_CAGATC_L001_R1_001_fastqc.html](https://eagle.fish.washington.edu/Arabidopsis/20150413_2112_lane1_CAGATC_L001_R1_001_fastqc.html) [20150413_2112_lane1_CAGATC_L001_R1_002_fastqc.html](https://eagle.fish.washington.edu/Arabidopsis/20150413_2112_lane1_CAGATC_L001_R1_002_fastqc.html) [20150413_2112_lane1_CAGATC_L001_R1_003_fastqc.html](https://eagle.fish.washington.edu/Arabidopsis/20150413_2112_lane1_CAGATC_L001_R1_003_fastqc.html) [20150413_2112_lane1_GCCAAT_L001_R1_001_fastqc.html](https://eagle.fish.washington.edu/Arabidopsis/20150413_2112_lane1_GCCAAT_L001_R1_001_fastqc.html) [20150413_2112_lane1_GCCAAT_L001_R1_002_fastqc.html](https://eagle.fish.washington.edu/Arabidopsis/20150413_2112_lane1_GCCAAT_L001_R1_002_fastqc.html) [20150413_2112_lane1_TGACCA_L001_R1_001_fastqc.html](https://eagle.fish.washington.edu/Arabidopsis/20150413_2112_lane1_TGACCA_L001_R1_001_fastqc.html) [20150413_2112_lane1_TTAGGC_L001_R1_001_fastqc.html](https://eagle.fish.washington.edu/Arabidopsis/20150413_2112_lane1_TTAGGC_L001_R1_001_fastqc.html) [20150413_2112_lane1_TTAGGC_L001_R1_002_fastqc.html](https://eagle.fish.washington.edu/Arabidopsis/20150413_2112_lane1_TTAGGC_L001_R1_002_fastqc.html)