Soil metagenomes from greenhouse study with biofertilizer

Overview

We are analyzing the metagenomes from 2 soil samples that were used a greenhouse study prior to biofertilizer treatment. These will give us a sense of resident microbial (fungal and bacterial) communities residing in them.

  • F1B-KM40 - bulk field soil
  • F2R-KM41 - rhizosphere soil

The goal is to generate phylogenetic trees of bacteria and fungi in these samples.

Methods

  • Confirm data integrity with checksums
  • Quality control: Fast QC and MultiQC
  • Preprocessing: trimming via Trimmomatic, merging with PEAR
  • Assembly via Megahit
  • Tree construction using Megan

Trimming

Trimming removes adapters and low-quality bases.

This produces unpaired and paired file outputs. Paired reads are those which both forward and reverse survived trimming. These are used for downstream analysis like merging and assembly. Unpaired reads indicate where only one of the pair survived (the other was discarded due to low quality or short length).

Merging

These are R1/R2 (forward and reverse reads) and will have to be merged. This is the last component of pre-processing as we work towards metagenome assembly.

/home/shared/fastp-v0.24.0/fastp \
  -i F1B-KM40_trimmed_R1_paired.fastq.gz \
  -I F1B-KM40_trimmed_R2_paired.fastq.gz \
  --merge \
  --merged_out F1B-KM40_merged.fastq.gz \ 
  
/home/shared/fastp-v0.24.0/fastp \
  -i F2R-KM41_trimmed_R1_paired.fastq.gz \
  -I F2R-KM41_trimmed_R2_paired.fastq.gz \
  --merge \
  --merged_out F2R-KM41_merged.fastq.gz 

Assembly

To assemble the metagenome files, MEGAHIT was used.

./megahit 
  -r ../F1B-KM40_merged.fastq.gz  #specifying input file
  -o megahit_F1B_KM40_out   #output directory
  --min-contig-len 500  #over 500 bps
  -t 8  #8 threads

Like other steps, this is done with both files.

./megahit \
  -r ../F2R-KM41_merged.fastq.gz \
  -o megahit_F2R_KM41_out \
  --min-contig-len 500 \
  -t 8

Results

Phylogenetic trees are constructed using Megan. This is the result for F2R, the rhizosphere soil. The code and results for F1B will be included in the next update.

Rhizosphere soil top hits

Table 1: Rhizosphere soil top hits
Organism Classification
Acidobacteria bacterium Bacteria
Alphaproteobacteria bacterium Bacteria
Betaproteobacteria bacterium Bacteria
Verrucomicrobia bacterium Bacteria
Actinobacteria bacterium Bacteria

Plot: Contig length

To better visualize our results, we will plot a histogram of contig lengths for this F2R rhizosphere soil sample.

Next 4 weeks

  • Taxonomy for F1B (bulk soil)

  • Annotation via MG-RAST

  • Visualizations

  • Plan for remaining 46 metagenomes after course completion