Code
if ("tidyverse" %in% rownames(installed.packages()) == 'FALSE') install.packages('tidyverse')
if ("SRAdb" %in% rownames(installed.packages()) == 'FALSE') BiocManager::install("SRAdb")
Environment pre-reqs:
if ("tidyverse" %in% rownames(installed.packages()) == 'FALSE') install.packages('tidyverse')
if ("SRAdb" %in% rownames(installed.packages()) == 'FALSE') BiocManager::install("SRAdb")
library(SRAdb)
library(tidyverse)
First we want to get an idea of the files we are downloading and the samples that generated the data. We will start by looking at the metadata for the samples.
# navigate to data directory
cd ../data
# download metadata from the git repo into the data directory
curl -O https://raw.githubusercontent.com/AHuffmyer/EarlyLifeHistory_Energetics/master/Mcap2020/Data/TagSeq/Sample_Info.csv
#pull data into R and rename it metadata
<- read_csv("../data/Sample_Info.csv") metadata
md5sum
cd ../data
md5sum Sample_Info.csv > md5.transferred
cd ../data
cmp Sample_Info.csv md5.transferred
These files differ by 1 byte, and I haven’t yet figured out why… possibly a windows vs unix thing
#look at the metadata
head(metadata)
There are 39 samples (rows) with 8 metadata columns in this tibble dataset (AH1 - AH39). These samples are Montipora capitata coral taken at different life-stages (denoted by column names time-stage
and code
), and RNA extracted and sequenced using Tag-Seq.
Roberts Lab Resources Github issue#1569 thread
Using sratoolkit.3.0.2-ubuntu64
which is already downloaded in /home/shared
folder
/home/shared/sratoolkit.3.0.2-ubuntu64/bin/./fasterq-dump \
\
--outdir /home/shared/8TB_HDD_01/mcap \
--progress \
SRR22293447 \
SRR22293448 \
SRR22293449 \
SRR22293450 \
SRR22293451 \
SRR22293452 \
SRR22293453 SRR22293454
Absolute path to fastq files in raven:
/home/shared/8TB_HDD_01/mcap/
Relative path to fastq files in raven:
cd ../../../../../8TB_HDD_01/mcap/
Check that the fastq files are downloaded:
cd /home/shared/8TB_HDD_01/mcap/
ls