Week 5 Progress

Quantifying DGE in M. galloprovincialis utilizing existing RNASeq data

C. Mantegna

4/27/23

Project Overview

Utilizing differential gene expression to support conclusions about what common or unique physiological responses are displayed by M. galloprovincialis when exposed to the anthropogenic stressors listed below.

Accession ID Tissue Type Exposure
SRR19782039 Gill Valsartan & Carbamazepine
SRR16771870 Gill Synthetic 17 a-Ethinylestradiol
SRR7725722 Gill Diarrhetic Shellfish Poisoning
SRR13013756 Gill Hypoxia

Methods

Methods

Choose organism & question

Methods

Search NCBI for existing RNASeq files

Choose organism & question

Methods

Align sequences using Kallisto

Search NCBI for existing RNASeq files

Choose organism & question

Methods

Quantify DEG & Annotate

Align sequences using Kallisto

Search NCBI for existing RNASeq files

Choose organism & question

Methods

Visualize Data

Quantify DEG & Annotate

Align sequences using Kallisto

Search NCBI for existing RNASeq files

Choose organism & question

Methods

Perform Gene Enrichment Analysis

Visualize Data

Quantify DEG & Annotate

Align sequences using Kallisto

Search NCBI for existing RNASeq files

Choose organism & question

Preliminary Results

Taking a look at my Kallisto output

countmatrix <- read.delim(“../output/kallisto_01.isoform.counts.matrix”, header = TRUE, sep = ’) rownames(countmatrix) <- countmatrix$X countmatrix <- countmatrix[,-1] head(countmatrix)

                                        SRR16771870 SRR7725722
lcl|UYJE01009484.1_cds_VDI73888.1_70276      0.0000     0.0000
lcl|UYJE01010125.1_cds_VDI79971.1_75350      0.0000     0.0000
lcl|UYJE01003300.1_cds_VDI18171.1_24742     31.1053    19.7958
lcl|UYJE01005620.1_cds_VDI38780.1_41573     52.5000   151.5000
lcl|UYJE01006464.1_cds_VDI46223.1_47578     91.1492   255.4200
lcl|UYJE01006499.1_cds_VDI46496.1_47798     48.6272    62.0000

Only 2 columns?

countmatrix <- read.delim(“../output/kallisto_01.isoform.counts.matrix”, header = TRUE, sep = ’) rownames(countmatrix) <- countmatrix$X countmatrix <- countmatrix[,-1] head(countmatrix) dim(countmatrix)

                                        SRR16771870 SRR7725722
lcl|UYJE01009484.1_cds_VDI73888.1_70276      0.0000     0.0000
lcl|UYJE01010125.1_cds_VDI79971.1_75350      0.0000     0.0000
lcl|UYJE01003300.1_cds_VDI18171.1_24742     31.1053    19.7958
lcl|UYJE01005620.1_cds_VDI38780.1_41573     52.5000   151.5000
lcl|UYJE01006464.1_cds_VDI46223.1_47578     91.1492   255.4200
lcl|UYJE01006499.1_cds_VDI46496.1_47798     48.6272    62.0000
[1] 78735     2

Troubleshooting Missing Sequences

This code finds files ending in 1.fstq

find /home/shared/8TB_HDD_02/cnmntgna/GitHub/chris-musselcon/output/ncbi/*_1.fastq \
| xargs basename -s _1.fastq  | xargs -I{} /home/shared/kallisto/kallisto \
quant -i ../data/MGAL_cds.index \
-o ../output/kallisto_01/{} \
-t 4 \
/home/shared/8TB_HDD_02/cnmntgna/GitHub/chris-musselcon/output/ncbi/{}_1.fastq \
/home/shared/8TB_HDD_02/cnmntgna/GitHub/chris-musselcon/output/ncbi/{}_2.fastq \
This code finds files ending in 2.fstq

find /home/shared/8TB_HDD_02/cnmntgna/GitHub/chris-musselcon/output/ncbi/*_1.fastq \
| xargs basename -s _1.fastq  | xargs -I{} /home/shared/kallisto/kallisto \
quant -i ../data/MGAL_cds.index \
-o ../output/kallisto_01/{} \
-t 4 \
/home/shared/8TB_HDD_02/cnmntgna/GitHub/chris-musselcon/output/ncbi/{}_1.fastq \
/home/shared/8TB_HDD_02/cnmntgna/GitHub/chris-musselcon/output/ncbi/{}_2.fastq \

Next Steps

mussels
- Run 2 missing sequences through Kallisto
- Run DESeq
- Quantify DEG & Annotate
- Visualize Data
- Perform Gene Enrichment Analysis