---
title: "Step 1: Get Genome Data using `curl`and `wget`"
subtitle: "Use `curl` and `wget` to pull a reference genome from Rutgers University"
author: "Sarah Tanja"
date: "`r format(Sys.time(), '%d %B, %Y')`"
format: gfm
---

# Overview

# Download Annotated Reference Genome for *Montipora capitata*

*Montipora capitata* Genome version V3, Rutgers University: <http://cyanophora.rutgers.edu/montipora/>

Genome publication: <https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giac098/6815755>

Nucleotide Coding Sequence (CDS): <http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.genes.cds.fna.gz>

This code grabs the *Montipora capitata* fasta file (rna.fna) of genes.

```{r, engine='bash'}
# change to work in data directory
cd ../data
# download the rna.fna file to data directory from the gannet server
curl -O http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.genes.cds.fna.gz

wget http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.assembly.fasta.gz

wget http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.genes.gff3.gz
```

# GFF

Obtain `Montipora_capitata_HIv3.genes_fixed.gff3` file by downloading from GitHub.

```{r, engine='bash'}
cd ../data
wget https://github.com/AHuffmyer/EarlyLifeHistory_Energetics/raw/master/Mcap2020/Data/TagSeq/Montipora_capitata_HIv3.genes_fixed.gff3.gz
```

This was generated by running the original gff file `Montipora_capitata_HIv3.genes.gff3.gz` through this script in R: <https://github.com/AHuffmyer/EarlyLifeHistory_Energetics/blob/master/Mcap2020/Scripts/TagSeq/Genome_V3/fix_gff_format.Rmd>

Unzip gff and genome file

```{r, engine='bash'}
cd ../data
gunzip Montipora_capitata_HIv3.genes.gff3.gz
gunzip Montipora_capitata_HIv3.genes_fixed.gff3.gz
gunzip Montipora_capitata_HIv3.assembly.fasta.gz
```

# Check file integrity with `md5sum`

::: callout-info
[Learn How to Generate and Verify Files with MD5 Checksum in Linux](https://www.tecmint.com/generate-verify-check-files-md5-checksum-linux/)
:::

```{r, engine='bash'}
cd ../data
md5sum Sample_Info.csv > md5.transferred
```

```{r, engine='bash'}
cd ../data
cmp Sample_Info.csv md5.transferred
```

# Summary & Next Steps

So we now have the reference genome V3 files located at `../data/`

::: callout-info
**Next Step:** Go to `02_get_sequences` to downlaod raw FASTQ files from Azenta
:::