--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left date: 2018-05-01 22:06:32+00:00 layout: post slug: assembly-stats-sparseassembler-k95-on-geoduck-sequence-data-quast-for-stats title: Assembly & Stats - SparseAssembler (k95) on Geoduck Sequence Data > Quast for Stats categories: - 2018 - Geoduck Genome Sequencing tags: - geoduck - mox - Panopea generosa - QUAST - SparseAssembler --- [Had a successful assembly with SparseAssembler k101](https://robertslab.github.io/sams-notebook/posts/2018/2018-04-05-genome-assembly-sparseassembler-geoduck-genomic-data-kmer101/), but figured I'd just tweak the kmer setting and throw it in the queue and see how it compares; minimal effort/time needed. Initiatied an assembly run using [SparseAssembler](https://github.com/yechengxi/SparseAssembler) on our [Mox HPC node](https://github.com/RobertsLab/hyak_mox/wiki) on all of our geoduck genomic sequencing data: * [BGI HiSeq Data](https://robertslab.github.io/sams-notebook/posts/2018/2018-03-27-fastqcmultiqc-bgi-geoduck-genome-sequencing-data/) * [Illumina Mate Pair HiSeq Data](https://robertslab.github.io/sams-notebook/posts/2018/2018-04-01-trimgalorefastqcmultiqc-illumina-hiseq-genome-sequencing-data-continued/) * [Illumina NovaSeq Data](https://robertslab.github.io/sams-notebook/posts/2018/2018-01-25-adapter-trimming-and-fastqc-illumina-geoduck-novaseq-data/) Kmer size set to 95. Slurm script: [20180423_sparse_assembler_kmer95_geoduck_slurm.sh](https://owl.fish.washington.edu/Athaliana/20180423_sparseassembler_kmer95_geoduck/20180423_sparse_assembler_kmer95_geoduck_slurm.sh) After the run finished, I copied the files to our server (Owl) and then ran Quast on my computer to gather some assembly stats, using the following command: /home/sam/software/quast-4.5/quast.py \ -t 24 \ --labels 20180423_sparse_k95 \ /mnt/owl/Athaliana/20180423_sparseassembler_kmer95_geoduck/Contigs.txt \ * * * ##### Results: SparseAssembler output folder: [20180423_sparseassembler_kmer95_geoduck/](https://owl.fish.washington.edu/Athaliana/20180423_sparseassembler_kmer95_geoduck/) SparseAsembler assembley (FastA; 15GB): [20180423_sparseassembler_kmer95_geoduck/Contigs.txt](https://owl.fish.washington.edu/Athaliana/20180423_sparseassembler_kmer95_geoduck/Contigs.txt) Quast output folder: [quast_results/results_2018_05_10_15_04_07](https://owl.fish.washington.edu/Athaliana/quast_results/results_2018_05_10_15_04_07/) Quast report (HTML): [quast_results/results_2018_05_10_15_04_07/report.html](https://owl.fish.washington.edu/Athaliana/quast_results/results_2018_05_10_15_04_07/report.html) I've embedded the Quast HTML report below, but it may be easier to view by using the link above. Well, it's remarkable how different this is than the [previous SparseAssembler with k101 setting](https://robertslab.github.io/sams-notebook/posts/2018/2018-04-05-genome-assembly-sparseassembler-geoduck-genomic-data-kmer101/)! This assembly doesn't have a single contig >50,000bp, while the previous one has four contigs over that threshold! Definitely shows what a large impact the kmer setting in assembly software can have on the final assembly!