---
title: "Step 1: Get Genome Data using `curl`and `wget`"
subtitle: "Use `curl` and `wget` to pull a reference genome from Rutgers University"
author: "Sarah Tanja"
date: "`r format(Sys.time(), '%d %B, %Y')`"
format: gfm
---
# Overview
# Genome
*Montipora capitata* Genome version V3, Rutgers University:
[`Montipora_capitata_HIv3.assembly.fasta`](https://owl.fish.washington.edu/halfshell/genomic-databank/Montipora_capitata_HIv3.assembly.fasta) (745MB)
- MD5 checksum: `99819eadba1b13ed569bb902eef8da08`
Genome publication:
::: callout-note
A compiled list of genomic resources for *Montipora capitata* can be found on the Robert's Lab handbook [here](https://robertslab.github.io/resources/Genomic-Resources/#montipora-capitata:~:text=mmag_pilon_scaffolds.fasta.fai-,Montipora%20capitata,-Genomes%3A)
:::
This code grabs the *Montipora capitata* fasta file (rna.fna) of genes.
```{r, engine='bash'}
# change to work in data directory
cd ../input/genome
wget http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.assembly.fasta.gz
```
# GFF
[`Montipora_capitata_HIv3.genes.gff3`](https://owl.fish.washington.edu/halfshell/genomic-databank/Montipora_capitata_HIv3.genes.gff3) (67MB)
- MD5 checksum: `5f6b80ba2885471c8c1534932ccb7e84`
Obtain `Montipora_capitata_HIv3.genes_fixed.gff3` file by downloading from GitHub.
```{r, engine='bash'}
cd ../input/genome
wget http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.genes.gff3.gz
wget https://github.com/AHuffmyer/EarlyLifeHistory_Energetics/raw/master/Mcap2020/Data/TagSeq/Montipora_capitata_HIv3.genes_fixed.gff3.gz
```
This was generated by running the original gff file `Montipora_capitata_HIv3.genes.gff3.gz` through this script in R:
Unzip gff and genome file
```{r, engine='bash'}
cd ../input/genome
gunzip Montipora_capitata_HIv3.genes.gff3.gz
gunzip Montipora_capitata_HIv3.genes_fixed.gff3.gz
gunzip Montipora_capitata_HIv3.assembly.fasta.gz
```
# Check file integrity with `md5sum`
::: callout-info
[Learn How to Generate and Verify Files with MD5 Checksum in Linux](https://www.tecmint.com/generate-verify-check-files-md5-checksum-linux/)
:::
# Checkout genome GFF format
```{r, engine='bash'}
cd ../input/genome
md5sum *Montipora_capitata*
```
```{r, engine = 'bash'}
cd ../input/genome
head -10 Montipora_capitata_HIv3.genes.gff3
```
# Summary & Next Steps
So we now have the reference genome V3 files located at `../input/genome`
::: callout-info
**Next Step:** Go to `02_get_sequences` to download raw FASTQ files from Azenta
:::
::: callout-important
###### Don't forget to always rsync backup!
```
rsync -avz /media/4TB_JPG_ext/stanja/gitprojects \
stanja@gannet.fish.washington.edu:/volume2/web/stanja/ravenbackup
```
:::