---
title: "Step 1: Get Genome Data using `curl`and `wget`"
subtitle: "Use `curl` and `wget` to pull a reference genome from Rutgers University"
author: "Sarah Tanja"
date: "`r format(Sys.time(), '%d %B, %Y')`"
format: gfm
---
# Overview
# Download Annotated Reference Genome for *Montipora capitata*
*Montipora capitata* Genome version V3, Rutgers University:
Genome publication:
Nucleotide Coding Sequence (CDS):
This code grabs the *Montipora capitata* fasta file (rna.fna) of genes.
```{r, engine='bash'}
# change to work in data directory
cd ../data
# download the rna.fna file to data directory from the gannet server
curl -O http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.genes.cds.fna.gz
wget http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.assembly.fasta.gz
wget http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.genes.gff3.gz
```
# GFF
Obtain `Montipora_capitata_HIv3.genes_fixed.gff3` file by downloading from GitHub.
```{r, engine='bash'}
cd ../data
wget https://github.com/AHuffmyer/EarlyLifeHistory_Energetics/raw/master/Mcap2020/Data/TagSeq/Montipora_capitata_HIv3.genes_fixed.gff3.gz
```
This was generated by running the original gff file `Montipora_capitata_HIv3.genes.gff3.gz` through this script in R:
Unzip gff and genome file
```{r, engine='bash'}
cd ../data
gunzip Montipora_capitata_HIv3.genes.gff3.gz
gunzip Montipora_capitata_HIv3.genes_fixed.gff3.gz
gunzip Montipora_capitata_HIv3.assembly.fasta.gz
```
# Check file integrity with `md5sum`
::: callout-info
[Learn How to Generate and Verify Files with MD5 Checksum in Linux](https://www.tecmint.com/generate-verify-check-files-md5-checksum-linux/)
:::
```{r, engine='bash'}
cd ../data
md5sum Sample_Info.csv > md5.transferred
```
```{r, engine='bash'}
cd ../data
cmp Sample_Info.csv md5.transferred
```
# Summary & Next Steps
So we now have the reference genome V3 files located at `../data/`
::: callout-info
**Next Step:** Go to `02_get_sequences` to downlaod raw FASTQ files from Azenta
:::