--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left layout: post title: Annotation - Olurida_v081 MAKER ID Mapping date: '2019-01-08 07:44' tags: - MAKER - Ostrea lurida - Olympia oyster - mox categories: - 2019 - Olympia Oyster Genome Sequencing --- [After getting through the initial MAKER annotation and SNAP gene predictions](https://robertslab.github.io/sams-notebook/posts/2018/2018-11-27-Annotation---Olurida_v081-MAKER-on-Mox/), I needed (wanted?) to simplify annotation names that will be easier to read and are in a more standardized format; similiar to NCBI. MAKER provides this functionality. Ran the following SBATCH script on Mox: - [20190108_oly_maker_id_mapping/20190108_oly_maker_id_mapping.sh](https://gannet.fish.washington.edu/Atumefaciens/20190108_oly_maker_id_mapping/20190108_oly_maker_id_mapping.sh) (text file)
#!/bin/bash
## Job Name
#SBATCH --job-name=maker
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=15-00:00:00
## Memory per node
#SBATCH --mem=120G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190108_oly_maker_id_mapping
# Load Python Mox module for Python module availability
module load intel-python3_2017
# Load Open MPI module for parallel, multi-node processing
module load icc_19-ompi_3.1.2
# SegFault fix?
export THREADS_DAEMON_MODEL=1
# Document programs in PATH (primarily for program version ID)
date >> system_path.log
echo "" >> system_path.log
echo "System PATH for $SLURM_JOB_ID" >> system_path.log
echo "" >> system_path.log
printf "%0.s-" {1..10} >> system_path.log
echo ${PATH} | tr : \\n >> system_path.log
# Variables
maker_dir=/gscratch/srlab/programs/maker-2.31.10/bin
maker_prot_fasta=/gscratch/scrubbed/samwhite/outputs/20181127_oly_maker_genome_annotation/snap02/20181127_oly_genome_snap02.all.maker.proteins.fasta
maker_transcripts_fasta=/gscratch/scrubbed/samwhite/outputs/20181127_oly_maker_genome_annotation/snap02/20181127_oly_genome_snap02.all.maker.transcripts.fasta
snap02_gff=/gscratch/scrubbed/samwhite/outputs/20181127_oly_maker_genome_annotation/snap02/20181127_oly_genome_snap02.all.gff
cp ${maker_prot_fasta} 20181127_oly_genome_snap02.all.maker.proteins.renamed.fasta
cp ${maker_transcripts_fasta} 20181127_oly_genome_snap02.all.maker.transcripts.renamed.fasta
cp ${snap02_gff} 20181127_oly_genome_snap02.all.renamed.gff
# Run MAKER programs
## Change gene names
${maker_dir}/maker_map_ids \
--prefix OLUR_ \
--justify 8 \
${snap02_gff} \
> 20181127_oly_genome.map
## Map GFF IDs
${maker_dir}/map_gff_ids \
20181127_oly_genome.map \
20181127_oly_genome_snap02.all.renamed.gff
## Map FastAs
### Proteins
${maker_dir}/map_fasta_ids \
20181127_oly_genome.map \
20181127_oly_genome_snap02.all.maker.proteins.renamed.fasta
### Transcripts
${maker_dir}/map_fasta_ids \
20181127_oly_genome.map \
20181127_oly_genome_snap02.all.maker.transcripts.renamed.fasta
---
# RESULTS
Output folder:
- [20190108_oly_maker_id_mapping/](https://gannet.fish.washington.edu/Atumefaciens/20190108_oly_maker_id_mapping/)
Mapping file (text):
- [20190108_oly_maker_id_mapping/20181127_oly_genome.map](https://gannet.fish.washington.edu/Atumefaciens/20190108_oly_maker_id_mapping/20181127_oly_genome.map) (2.7MB)
Renamed GFF:
- [20190108_oly_maker_id_mapping/20181127_oly_genome_snap02.all.renamed.gff](https://gannet.fish.washington.edu/Atumefaciens/20190108_oly_maker_id_mapping/20181127_oly_genome_snap02.all.renamed.gff) (2.3GB)
Renamed protein FastA:
- [20190108_oly_maker_id_mapping/20181127_oly_genome_snap02.all.maker.proteins.renamed.fasta](https://gannet.fish.washington.edu/Atumefaciens/20190108_oly_maker_id_mapping/20181127_oly_genome_snap02.all.maker.proteins.renamed.fasta) (10MB)
Renamed transcripts FastA:
- [20190108_oly_maker_id_mapping/20181127_oly_genome_snap02.all.maker.transcripts.renamed.fasta](https://gannet.fish.washington.edu/Atumefaciens/20190108_oly_maker_id_mapping/20181127_oly_genome_snap02.all.maker.transcripts.renamed.fasta) (30MB)
Here's a look at the files to show that this worked.
---
- Map file
###### This files is used as a key for the name conversions, original names on left, new names on right
![head map file](https://raw.githubusercontent.com/RobertsLab/sams-notebook/master/images/screencaps/20190108_001.png)
---
- Proteins FastA
###### Note the name update at the beginning of the FastA descriptions: ```Olurida_#########-RA```
![head proteins FastA](https://raw.githubusercontent.com/RobertsLab/sams-notebook/master/images/screencaps/20190108_002.png)
---
- Transcripts FastA
###### Note the name update at the beginning of the FastA descriptions: ```Olurida_#########-RA```
![head transcripts FastA](https://raw.githubusercontent.com/RobertsLab/sams-notebook/master/images/screencaps/20190108_003.png)
---
- GFF
###### The new IDs are present but only shown for gene models:
![head GFF](https://raw.githubusercontent.com/RobertsLab/sams-notebook/master/images/screencaps/20190108_004.png)
![gene models GFF](https://raw.githubusercontent.com/RobertsLab/sams-notebook/master/images/screencaps/20190108_005.png)
Regardless, the next steps are to use the newly labeled protein FastA file for BLASTp and protein domain identification. Will move forward with those two steps.