After cleaning and concentrating all extractions using KAPA Pure Beads, used the TapeStation with a Genomic Tape to check DNA concetration, quality, and size distribution. Followed protocol for Agilent 4200 gDNA .
See full TapeStation summaries here: Tape 1 summary , Tape 2 summary
(Note the above plots have different y-axis scales)
1
14366
1886
-
1.57
63.15
1
19054
1898
1.2
21.6
82.13
1
52295
1912
1.0
19.9
61.63
1
1180630
1912
1.0
4.33
62.25
1
50603
1912
1.0
42.4
73.82
1
51732
1960
2.1
157
93.80
1
51892
1960
1.9
137
94.20
1
14399
1886
1.2
100
86.10
1
42137
1898
1.1
46.3
86.49
1
51730
1960
1.1
59.6
71.13
1
51728
1960
1.4
23.7
89.90
1
100609
2000
2.9
261
96.34
1
100610
2000
1.9
164
93.57
1
1007393
2000
2.8
218
93.49
1
1018355
2002
1.8
115
89.98
2
1606824
2019
1.7
353
90.80
2
1606826
2019
3.4
408
97.21
2
1740336
2019
3.1
368
96.99
2
1740363
2019
3.3
343
97.40
2
1740390
2019
2.4
291
96.83
2
1740407
2019
3.1
260
96.89
2
50368
1880
1.7
68.3
91.68
2
51727
1960
1.3
15.3
87.12
2
51729
1960
1.5
18.7
89.23
2
51857
1960
1.4
109
93.00
2
51858
1960
1.3
24.2
88.78
2
51859
1960
1.5
117
92.46
2
51860
1960
1.4
15.6
86.46
2
51861
1960
1.7
94.1
94.72
Warning: package 'googlesheets4' was built under R version 4.2.3
Warning: package 'tidyr' was built under R version 4.2.3
Warning: package 'ggplot2' was built under R version 4.2.3
Warning: package 'plotly' was built under R version 4.2.3
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
conc <- read_sheet ("https://docs.google.com/spreadsheets/d/1zDMy7AOCM8Ug85q4vr9CoMDxfF-zDJk1DnaxakSt1Hg/edit?usp=sharing" )
! Using an auto-discovered, cached token.
To suppress this message, modify your code or options to clearly consent to
the use of a cached token.
See gargle's "Non-interactive auth" vignette for more details:
<https://gargle.r-lib.org/articles/non-interactive-auth.html>
ℹ The googlesheets4 package is using a cached token for 'kdurkin1@uw.edu'.
✔ Reading from "E_tourneforti".
✔ Range 'E_tourneforti_samples'.
conc$ Collection.Age <- ifelse (conc$ Year.Collected < 1920 , "pre-1920" ,
ifelse (conc$ Year.Collected > 1999 , "Modern" , "1960s-70s" ))
conc$ Collection.Age <- factor (conc$ Collection.Age,
levels = c ("pre-1920" , "1960s-70s" , "Modern" ),
ordered = TRUE )
p1 <- ggplot (conc, aes (x= Tapestation.Perc.300 up,
y= Tapestation.Conc.ngul,
color = Collection.Age,
text = paste ("Catalog #:" , Catalog.Number,
"<br>Year:" , Year.Collected,
"<br>% >300kb:" , Tapestation.Perc.300 up,
"<br>Conc:" , Tapestation.Conc.ngul))) +
geom_jitter (size = 2 , width = 0.25 , height = 0 ) +
geom_hline (yintercept = 13.33 , color = 'black' ) +
theme_minimal () +
theme (axis.text.x = element_text (angle = 45 , hjust = 1 )) +
xlab ("Percent of reads >300kb" ) +
ylab ("Concentration (ng/uL)" )
p2 <- ggplot (conc, aes (x= Tapestation.DIN,
y= Tapestation.Conc.ngul,
color= Collection.Age,
text = paste ("Catalog #:" , Catalog.Number,
"<br>Year:" , Year.Collected,
"<br>DIN:" , Tapestation.DIN,
"<br>Conc:" , Tapestation.Conc.ngul))) +
geom_jitter (size = 2 , width = 0.25 , height = 0 ) +
geom_hline (yintercept = 13.33 , color = 'black' ) +
theme_minimal () +
theme (axis.text.x = element_text (angle = 45 , hjust = 1 )) +
xlab ("DNA Integrity Number (DIN)" ) +
ylab ("Concentration (ng/uL)" )
p3 <- ggplot (conc, aes (x= Tapestation.DIN,
y= Tapestation.Perc.300 up,
color= Collection.Age,
text = paste ("Catalog #:" , Catalog.Number,
"<br>Year:" , Year.Collected,
"<br>DIN:" , Tapestation.DIN,
"<br>% >300kb:" , Tapestation.Perc.300 up))) +
geom_jitter (size = 2 , width = 0.25 , height = 0 ) +
geom_hline (yintercept = 13.33 , color = 'black' ) +
theme_minimal () +
theme (axis.text.x = element_text (angle = 45 , hjust = 1 )) +
xlab ("DNA Integrity Number (DIN)" ) +
ylab ("Percent of reads >300kb" )
ggplotly (p1, tooltip = "text" )
ggplotly (p2, tooltip = "text" )
ggplotly (p3, tooltip = "text" )
Sample selection
Based on the above results, I can pick which samples are most suited for sequencing, considering DNA concentration (as a proxy for total DNA available) and DNA integrity. I’m not sure whether % >300bp reads or DIN is the most appropriate measure of DNA integrity here, but they’re highly correlated. I want n=5 for each of my three collection periods.
Modern (<25 yo)
USNM 1606826
USNM 1740336
USNM 1740363
USNM 100609
USNM 1740407
Mid-age (1960s-1970s):
USNM 51732
USNM 51892
USNM 51861
USNM 51857
USNM 51859
Historic (>100 yo):
USNM 50368
USNM 14399
USNM 42137
USNM 19054
USNM 50603
selected <- c ("USNM 1606826" , "USNM 1740336" , "USNM 1740363" , "USNM 100609" , "USNM 1740407" , "USNM 51732" , "USNM 51892" , "USNM 51861" , "USNM 51857" , "USNM 51859" , "USNM 50368" , "USNM 14399" , "USNM 42137" , "USNM 19054" , "USNM 50603" )
conc$ Selection <- ifelse (conc$ Catalog.Number %in% selected, "yes" , "no" )
p21 <- ggplot (conc, aes (x= Tapestation.Perc.300 up,
y= Tapestation.Conc.ngul,
color = Collection.Age,
shape= Selection,
text = paste ("Catalog #:" , Catalog.Number,
"<br>Year:" , Year.Collected,
"<br>% >300kb:" , Tapestation.Perc.300 up,
"<br>Conc:" , Tapestation.Conc.ngul))) +
geom_jitter (size = 2 , width = 0.25 , height = 0 ) +
geom_hline (yintercept = 13.33 , color = 'black' ) +
theme_minimal () +
theme (axis.text.x = element_text (angle = 45 , hjust = 1 )) +
xlab ("Percent of reads >300kb" ) +
ylab ("Concentration (ng/uL)" )
p22 <- ggplot (conc, aes (x= Tapestation.DIN,
y= Tapestation.Conc.ngul,
color= Collection.Age,
shape= Selection,
text = paste ("Catalog #:" , Catalog.Number,
"<br>Year:" , Year.Collected,
"<br>DIN:" , Tapestation.DIN,
"<br>Conc:" , Tapestation.Conc.ngul))) +
geom_jitter (size = 2 , width = 0.25 , height = 0 ) +
geom_hline (yintercept = 13.33 , color = 'black' ) +
theme_minimal () +
theme (axis.text.x = element_text (angle = 45 , hjust = 1 )) +
xlab ("DNA Integrity Number (DIN)" ) +
ylab ("Concentration (ng/uL)" )
ggplotly (p21, tooltip = "text" )
ggplotly (p22, tooltip = "text" )