---
author: Sam White
toc-title: Contents
toc-depth: 5
toc-location: left
date: 2018-08-08 22:54:49+00:00
layout: post
slug: genome-annotation-olympia-oyster-genome-annotation-results-01
title: 'Genome Annotation - Olympia oyster genome annotation results #01'
categories:
  - 2018
  - Olympia Oyster Genome Sequencing
tags:
  - annotation
  - genome
  - maker
  - olympia oyster
  - Ostrea lurida
  - wq-maker
---

Yesterday, [I annotated our Olympia oyster genome using WQ-MAKER in just 7hrs!](https://robertslab.github.io/sams-notebook/posts/2018/2018-08-07-genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/).

See that link for run setup and configuration.


* * *


##### RESULTS


Before proceeding further, it should be noted that I neglected to provide Maker with a transposable elements FastA file for RepeatMasker to use.

The following line in the [`maker_opts.ctl`](https://owl.fish.washington.edu/Athaliana/20180807_wqmaker_run_oly_01/maker_opts.ctl) file was originally populated with an absolute path to data I didn't recognize, so I removed it:


    <code>repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner</code>


I'm not entirely sure what the impacts will be on annotation, so I've re-run Maker with that line restored (using a relative path). You can find the results of that run here:


  * [Genome Annotation – Olympia oyster genome annotation results #02](https://robertslab.github.io/sams-notebook/posts/2018/2018-08-08-genome-annotation-olympia-oyster-genome-annotation-results-02/)


Output folder:


  * [20180807_wqmaker_run_oly_01](https://owl.fish.washington.edu/Athaliana/20180807_wqmaker_run_oly_01/)


Annotated genome file (GFF):


  * [20180807_wqmaker_run_oly_01/Olurida_v081.all.gff (1GB)(https://owl.fish.washington.edu/Athaliana/20180807_wqmaker_run_oly_01/Olurida_v081.all.gff)


I'd like to post a snippet of the GFF file here, but the line lengths are WAY too long and will be virtually impossible to read in this notebook. The GFF consists of listing a "parent" contig and its corresponding info (start/stop/length). Then, there are "children" of this contig that show various regions that are matched within the various databases that were queried, i.e. repeatmasker annotations for identifying repeat regions, protein2genome for full/partial protein matches, etc. Thus, a single scaffold (contig) can have dozens or hundreds of corresponding annotations!

Probably the easiest and most logical approach from here is to start working with scaffolds that are annotated with a "protein_match", as these have a corresponding GenBank ID. Parsing these out and then doing a join with a database of NCBI protein IDs will give us a basic annotation of "functional" portions of the genome.

Additionally, we should probably do some sort of comparison of this run with the follow up run where I provided the transposable elements FastA file to see what impacts the exclusion/inclusion of that info had on annotation.