--- title: "Step 1: Get Genome Data using `curl`and `wget`" subtitle: "Use `curl` and `wget` to pull a reference genome from Rutgers University" author: "Sarah Tanja" date: "`r format(Sys.time(), '%d %B, %Y')`" format: gfm --- # Overview # Genome *Montipora capitata* Genome version V3, Rutgers University: [`Montipora_capitata_HIv3.assembly.fasta`](https://owl.fish.washington.edu/halfshell/genomic-databank/Montipora_capitata_HIv3.assembly.fasta) (745MB) - MD5 checksum: `99819eadba1b13ed569bb902eef8da08` Genome publication: ::: callout-note A compiled list of genomic resources for *Montipora capitata* can be found on the Robert's Lab handbook [here](https://robertslab.github.io/resources/Genomic-Resources/#montipora-capitata:~:text=mmag_pilon_scaffolds.fasta.fai-,Montipora%20capitata,-Genomes%3A) ::: This code grabs the *Montipora capitata* fasta file (rna.fna) of genes. ```{r, engine='bash'} # change to work in data directory cd ../input/genome wget http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.assembly.fasta.gz ``` # GFF [`Montipora_capitata_HIv3.genes.gff3`](https://owl.fish.washington.edu/halfshell/genomic-databank/Montipora_capitata_HIv3.genes.gff3) (67MB) - MD5 checksum: `5f6b80ba2885471c8c1534932ccb7e84` Obtain `Montipora_capitata_HIv3.genes_fixed.gff3` file by downloading from GitHub. ```{r, engine='bash'} cd ../input/genome wget http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.genes.gff3.gz wget https://github.com/AHuffmyer/EarlyLifeHistory_Energetics/raw/master/Mcap2020/Data/TagSeq/Montipora_capitata_HIv3.genes_fixed.gff3.gz ``` This was generated by running the original gff file `Montipora_capitata_HIv3.genes.gff3.gz` through this script in R: Unzip gff and genome file ```{r, engine='bash'} cd ../input/genome gunzip Montipora_capitata_HIv3.genes.gff3.gz gunzip Montipora_capitata_HIv3.genes_fixed.gff3.gz gunzip Montipora_capitata_HIv3.assembly.fasta.gz ``` # Check file integrity with `md5sum` ::: callout-info [Learn How to Generate and Verify Files with MD5 Checksum in Linux](https://www.tecmint.com/generate-verify-check-files-md5-checksum-linux/) ::: # Checkout genome GFF format ```{r, engine='bash'} cd ../input/genome md5sum *Montipora_capitata* ``` ```{r, engine = 'bash'} cd ../input/genome head -10 Montipora_capitata_HIv3.genes.gff3 ``` # Summary & Next Steps So we now have the reference genome V3 files located at `../input/genome` ::: callout-info **Next Step:** Go to `02_get_sequences` to download raw FASTQ files from Azenta ::: ::: callout-important ###### Don't forget to always rsync backup! ``` rsync -avz /media/4TB_JPG_ext/stanja/gitprojects \ stanja@gannet.fish.washington.edu:/volume2/web/stanja/ravenbackup ``` :::