Week 5 Progress
Quantifying DGE in M. galloprovincialis utilizing existing RNASeq data
4/27/23
Project Overview
Utilizing differential gene expression to support conclusions about what common or unique physiological responses are displayed by M. galloprovincialis when exposed to the anthropogenic stressors listed below.
SRR19782039
Gill
Valsartan & Carbamazepine
SRR16771870
Gill
Synthetic 17 a-Ethinylestradiol
SRR7725722
Gill
Diarrhetic Shellfish Poisoning
SRR13013756
Gill
Hypoxia
Methods
Choose organism & question
Methods
Search NCBI for existing RNASeq files
Choose organism & question
Methods
Align sequences using Kallisto
Search NCBI for existing RNASeq files
Choose organism & question
Methods
Quantify DEG & Annotate
Align sequences using Kallisto
Search NCBI for existing RNASeq files
Choose organism & question
Methods
Visualize Data
Quantify DEG & Annotate
Align sequences using Kallisto
Search NCBI for existing RNASeq files
Choose organism & question
Methods
Perform Gene Enrichment Analysis
Visualize Data
Quantify DEG & Annotate
Align sequences using Kallisto
Search NCBI for existing RNASeq files
Choose organism & question
Taking a look at my Kallisto output
countmatrix <- read.delim(“../output/kallisto_01.isoform.counts.matrix”, header = TRUE, sep = ’) rownames(countmatrix) <- countmatrix$X countmatrix <- countmatrix[,-1] head(countmatrix)
SRR16771870 SRR7725722
lcl|UYJE01009484.1_cds_VDI73888.1_70276 0.0000 0.0000
lcl|UYJE01010125.1_cds_VDI79971.1_75350 0.0000 0.0000
lcl|UYJE01003300.1_cds_VDI18171.1_24742 31.1053 19.7958
lcl|UYJE01005620.1_cds_VDI38780.1_41573 52.5000 151.5000
lcl|UYJE01006464.1_cds_VDI46223.1_47578 91.1492 255.4200
lcl|UYJE01006499.1_cds_VDI46496.1_47798 48.6272 62.0000
Only 2 columns?
countmatrix <- read.delim(“../output/kallisto_01.isoform.counts.matrix”, header = TRUE, sep = ’) rownames(countmatrix) <- countmatrix$X countmatrix <- countmatrix[,-1] head(countmatrix) dim(countmatrix)
SRR16771870 SRR7725722
lcl|UYJE01009484.1_cds_VDI73888.1_70276 0.0000 0.0000
lcl|UYJE01010125.1_cds_VDI79971.1_75350 0.0000 0.0000
lcl|UYJE01003300.1_cds_VDI18171.1_24742 31.1053 19.7958
lcl|UYJE01005620.1_cds_VDI38780.1_41573 52.5000 151.5000
lcl|UYJE01006464.1_cds_VDI46223.1_47578 91.1492 255.4200
lcl|UYJE01006499.1_cds_VDI46496.1_47798 48.6272 62.0000
Troubleshooting Missing Sequences
This code finds files ending in 1. fstq
find / home/ shared/ 8 TB_HDD_02 / cnmntgna/ GitHub/ chris- musselcon/ output/ ncbi/* _1.fastq \
| xargs basename - s _1.fastq | xargs - I{} / home/ shared/ kallisto/ kallisto \
quant - i ../ data/ MGAL_cds.index \
- o ../ output/ kallisto_01/ {} \
- t 4 \
/ home/ shared/ 8 TB_HDD_02 / cnmntgna/ GitHub/ chris- musselcon/ output/ ncbi/ {}_1.fastq \
/ home/ shared/ 8 TB_HDD_02 / cnmntgna/ GitHub/ chris- musselcon/ output/ ncbi/ {}_2.fastq \
This code finds files ending in 2. fstq
find / home/ shared/ 8 TB_HDD_02 / cnmntgna/ GitHub/ chris- musselcon/ output/ ncbi/* _1.fastq \
| xargs basename - s _1.fastq | xargs - I{} / home/ shared/ kallisto/ kallisto \
quant - i ../ data/ MGAL_cds.index \
- o ../ output/ kallisto_01/ {} \
- t 4 \
/ home/ shared/ 8 TB_HDD_02 / cnmntgna/ GitHub/ chris- musselcon/ output/ ncbi/ {}_1.fastq \
/ home/ shared/ 8 TB_HDD_02 / cnmntgna/ GitHub/ chris- musselcon/ output/ ncbi/ {}_2.fastq \
Next Steps
- Run 2 missing sequences through Kallisto
- Run DESeq
- Quantify DEG & Annotate
- Visualize Data
- Perform Gene Enrichment Analysis