--- title: "Step 1: Get Genome Data using `curl`and `wget`" subtitle: "Use `curl` and `wget` to pull a reference genome from Rutgers University" author: "Sarah Tanja" date: "`r format(Sys.time(), '%d %B, %Y')`" format: gfm --- # Overview # Download Annotated Reference Genome for *Montipora capitata* *Montipora capitata* Genome version V3, Rutgers University: Genome publication: Nucleotide Coding Sequence (CDS): This code grabs the *Montipora capitata* fasta file (rna.fna) of genes. ```{r, engine='bash'} # change to work in data directory cd ../data # download the rna.fna file to data directory from the gannet server curl -O http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.genes.cds.fna.gz wget http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.assembly.fasta.gz wget http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.genes.gff3.gz ``` # GFF Obtain `Montipora_capitata_HIv3.genes_fixed.gff3` file by downloading from GitHub. ```{r, engine='bash'} cd ../data wget https://github.com/AHuffmyer/EarlyLifeHistory_Energetics/raw/master/Mcap2020/Data/TagSeq/Montipora_capitata_HIv3.genes_fixed.gff3.gz ``` This was generated by running the original gff file `Montipora_capitata_HIv3.genes.gff3.gz` through this script in R: Unzip gff and genome file ```{r, engine='bash'} cd ../data gunzip Montipora_capitata_HIv3.genes.gff3.gz gunzip Montipora_capitata_HIv3.genes_fixed.gff3.gz gunzip Montipora_capitata_HIv3.assembly.fasta.gz ``` # Check file integrity with `md5sum` ::: callout-info [Learn How to Generate and Verify Files with MD5 Checksum in Linux](https://www.tecmint.com/generate-verify-check-files-md5-checksum-linux/) ::: ```{r, engine='bash'} cd ../data md5sum Sample_Info.csv > md5.transferred ``` ```{r, engine='bash'} cd ../data cmp Sample_Info.csv md5.transferred ``` # Summary & Next Steps So we now have the reference genome V3 files located at `../data/` ::: callout-info **Next Step:** Go to `02_get_sequences` to downlaod raw FASTQ files from Azenta :::