sulf_bin <- read.csv("../data/Bins/assembly_plus_bins/assembly_plus_bin.4", header=F)
head(sulf_bin)
V1
1 MG1058_s15.ctg000016c
2 MG1058_s273.ctg000288c
3 MG1058_s408.ctg000428c
Create a presence-absence plot of genes in Marinimicrobia metagenome assembled genomes (MAGs).
Workflow to accomplish this goal:
The bin files contain the names of the contigs and reads within them.
sulf_bin <- read.csv("../data/Bins/assembly_plus_bins/assembly_plus_bin.4", header=F)
head(sulf_bin)
V1
1 MG1058_s15.ctg000016c
2 MG1058_s273.ctg000288c
3 MG1058_s408.ctg000428c
I then got the fasta files for each bin in order to get completeness and contamination statistics.
This is part of the code I used to get Marinimicrobia bins using the CAT-BAT annotations of each contig.
consensus <- all_contigs %>%
group_by(species) %>%
summarize(support = sum(ORFs_true)) %>%
filter(species != "no support")
index <- which(str_detect(consensus$species, "Marinimicrobia"))
consensus$species[index] <- "Marinimicrobia"
consensus <- consensus$species[which(consensus$support
== max(consensus$support))]
I used Busco to get completeness and contamination of the bins.
Busco results were combined with clade names in a summary table.
bin | sample | Dataset | Complete | Single | Duplicated | Fragmented | Missing | n_markers | Scaffold.N50 | Contigs.N50 | Percent.gaps | Number.of.scaffolds | sum_len | clade |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
24_sample_bam_bin.118 | 24_sample_bam_bins | bacteria_odb10 | 0.8 | 0.8 | 0 | 0.8 | 98.4 | 124 | 2280 | 2280 | 0.000% | 782 | 821700 | NA |
I used Anvio to visualize MAGs (which are the bins) and look at GC content and coverage.
for (one_contig in unique(ko$contig)){
ko_small <- subset(ko, contig == one_contig)
results <- NULL
for (i in 1:length(all_paths)){
path <- all_paths[i]
results$count[i] <- length(which(ko_small$pathway == path))
results$path[i] <- path
}
results <- as.data.frame(results)
results$bin <- one_contig
pathway <- rbind(pathway, results)
}
pathway$presence <- "no"
index <- which(pathway$count > 0)
pathway$presence[index] <- "yes"
This shows the presence-absence plot for Sulfitobacter.
In the next few weeks, I plan to refine the Sulfitobacter bins and get presence-absence gene plots of the “best” Marinimicrobia bins.