---
author: Sam White
toc-title: Contents
toc-depth: 5
toc-location: left
date: 2018-08-08 22:55:49+00:00
layout: post
slug: genome-annotation-olympia-oyster-genome-annotation-results-02
title: 'Genome Annotation – Olympia oyster genome annotation results #02'
categories:
  - 2018
  - Olympia Oyster Genome Sequencing
tags:
  - annotation
  - JetStream
  - maker
  - olympia oyster
  - Ostrea lurida
  - wq-maker
---

Yesterday, [I annotated our Olympia oyster genome using WQ-MAKER in just 7hrs!](https://robertslab.github.io/sams-notebook/posts/2018/2018-08-07-genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/).

See that link for run setup and configuration. They are essentially the same, except for the change I'll discuss below.

The results from that run can be seen here:


  * [Genome Annotation – Olympia oyster genome annotation results #01](https://robertslab.github.io/sams-notebook/posts/2018/2018-08-08-genome-annotation-olympia-oyster-genome-annotation-results-01/)


In that previous run, I neglected to provide a transposable elements FastA file for use with RepeatMasker.

I remedied that and re-ran it. I modified [`maker_opts.ctl`](https://owl.fish.washington.edu/Athaliana/20180807_wqmaker_run_oly_02/maker_opts.ctl) to include the following:


    <code>repeat_protein=../../opt/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner</code>


This TEs file is part of RepeatMasker.


* * *


# RESULTS


Output folder:


  * [20180807_wqmaker_run_oly_02](https://owl.fish.washington.edu/Athaliana/20180807_wqmaker_run_oly_02/)


Annotated genome file (GFF):


  * [20180807_wqmaker_run_oly_02/Olurida_v081.all.gff (1GB)(https://owl.fish.washington.edu/Athaliana/20180807_wqmaker_run_oly_02/Olurida_v081.all.gff)


* * *


![](https://owl.fish.washington.edu/Athaliana/20180807_wq-maker_06.png)


* * *


This run took about an hour longer than [the previous run](https://robertslab.github.io/sams-notebook/posts/2018/2018-08-07-genome-annotation-olympia-oyster-genome-complete-brief-note/), but for some reason it ran with only 21 workers, instead of 22. This is probably the reason for the increased run time.

I'd like to post a snippet of the GFF file here, but the line lengths are WAY too long and will be virtually impossible to read in this notebook. The GFF consists of listing a "parent" contig and its corresponding info (start/stop/length). Then, there are "children" of this contig that show various regions that are matched within the various databases that were queried, i.e. repeatmasker annotations for identifying repeat regions, protein2genome for full/partial protein matches, etc. Thus, a single scaffold (contig) can have dozens or hundreds of corresponding annotations!

Probably the easiest and most logical to start working with will be those scaffolds that are annotated with a "protein_match", as these have a corresponding GenBank ID. Parsing these out and then doing a join with NCBI protein IDs will give us a basic annotaiton of "functional" portions of the genome.

Additionally, we should probably do some sort of comparison of this run with the previous run where I did _not_ provide the transposable elements FastA file.