Continued to review Circos documentation.
- Thought about usefulness (uselessness?) of links for our use case. Possibly interesting to link genes with same GOslims?
Worked on CEABIGR/Circos organization. Should Circos be in it’s own branch? Does this simplify things or complicate things for usage? Better way to keep recommended Circos organization, but setup working directories outside of Circos program install location?
- Read Ch. 4 of “The Disordered Cosmos”
Helped reinstall Stacks on Mox for Marta (GitHub Issue)
- Worked extensively on Circos stuff for CEABIGR project again. Primarily generating Circos-formatted gene expression/methylation files. See CEABIGR
Worked extensively on Circos stuff for CEABIGR project. Had to fix some R code, as well as file formatting, bash scripts, and learning how to manipulate the
file. Actually generated a plot:Circos plot showing the C.virginica chromosome, NC_035781.1, as the black outer ring, with control female mean gene expression values (black inner ring) and exposed female mean gene expression values (green inner ring)
pipeline completed, so transferred data and wrote up notebook entry.Generated CEABIGR mean gene expression files, formatted for Circos usage.
pipeline on Mox for Crassostrea virginica (Eastern oyster) because I forgot that the pipeline has an aritficial cap of 10 BAMs. Edited thenextflow.config
file to increase cap to 50 and added code in thebio_DNA-methylation.md
doc to count the number of BAMs and specify theepidiverse/snp
command to use that number of BAMs. Prevents the aritficial limit from having any impact.Science Hour. Helped Matt with some Mox/Trinity (system
Updated ceabigr predominant isoform R Markdown (GitHub) to generate files for control/exposed female samples.
Due to issue with R loading a package and getting this error:
libgsl.so.23: cannot open shared object file: No such file or directory
, I came across some suggestions that if your current version of R was built with a previous version of Ubuntu (which happens to be my case, since I upgraded Ubuntu earlier this month), I decided to upgrade and build R. Took awhile to compile… I also ended up having to re-install packages. Ugh!Initiated
pipeline on Mox for Crassostrea virginica (Eastern oyster) Bismark BAMs, per this GitHub Issue.
Updated ceabigr predominant isoform R Markdown (GitHub) to include vectors containing sample types, as well as generate files for control/exposed male samples.
Finished notebook entry for geoduck HISAT2 alignments for lncRNA.
Read more Circos info.
ceabigr meeting with Yaamini. Decided to generate list of genes with differing predominant isoforms between females and males - these were the only two comparisons Steven had produced so far. List will also use a binary system (i.e.
to indicate no different or different, respectively).- Generated file(s). See this Jupyter Notebook (GitHub).
Did I finally fix the SBATCH script for geoduck Hisat2 alignments? It’s looking that way…
CEABIGR Circos stuff
awk 'BEGIN {OFS="\t"} {print "cvir"$1, $2, $3}' C_virginica-3.0_Gnomon_genes.bed > C_virginica-3.0_Gnomon_genes.bed.circos
Continued reading documentation.
Got distracted exploring how to get a list of all existing GO IDs. The idea being to then map all GOslims to the GO IDs and create a “flat” file that can be used for joining. Mostly reading about using the
library in R and how to extract information.
Continued troubleshooting SBATCH script for geoduck Hisat2 alignments. Despite declaring success yesterday, still realized the script wasn’t running properly. Grrrrrr….
Retrieved GOslims for single cell RNAseq (scRNAseq) project, per this GitHub Issue.
Permanantly fixed SBATCH script for geoduck Hisat2 alignments and succsufully ran, as part of the lncRNA identification.
Installed Circos on my computer (VM Ubuntu 22.04LTS):
Couldn’t install needed library:
sudo apt-get -y install libgd2-xpm-dev E: Unable to locate package libgd2-xpm-dev
Possible fix is:
sudo apt-get -y install libgd-dev
Missing Perl modules:
sam@computer:~/programs/circos-0.69-9/bin$ ./circos -modules ok 1.52 Carp ok 0.45 Clone missing Config::General ok 3.80 Cwd ok 2.179 Data::Dumper ok 2.58 Digest::MD5 ok 2.85 File::Basename ok 3.80 File::Spec::Functions ok 0.2311 File::Temp ok 1.52 FindBin missing Font::TTF::Font missing GD missing GD::Polyline ok 2.52 Getopt::Long ok 1.46 IO::File missing List::MoreUtils ok 1.55 List::Util missing Math::Bezier ok 1.999818 Math::BigFloat missing Math::Round missing Math::VecStat ok 1.03_01 Memoize ok 1.97 POSIX missing Params::Validate ok 2.01 Pod::Usage missing Readonly missing Regexp::Common missing SVG missing Set::IntSpan missing Statistics::Basic ok 3.23 Storable ok 1.23 Sys::Hostname ok 2.04 Text::Balanced missing Text::Format ok 1.9767 Time::HiRes
Needed to install
:apt-get install cpanminus
sudo cpanm Clone Config::General Font::TTF::Font GD GD::Polyline List::MoreUtils Math::Bezier Math::Round Math::VecStat Params::Validate Readonly Regexp::Common SVG Set::IntSpan Statistics::Basic Text::Format
Got me to this:
sam@computer:~/programs/circos-0.69-9/bin$ ./circos -modules ok 1.52 Carp ok 0.45 Clone ok 2.65 Config::General ok 3.80 Cwd ok 2.179 Data::Dumper ok 2.58 Digest::MD5 ok 2.85 File::Basename ok 3.80 File::Spec::Functions ok 0.2311 File::Temp ok 1.52 FindBin ok 0.39 Font::TTF::Font ok 2.76 GD ok 0.2 GD::Polyline ok 2.52 Getopt::Long ok 1.46 IO::File ok 0.430 List::MoreUtils ok 1.55 List::Util ok 0.01 Math::Bezier ok 1.999818 Math::BigFloat ok 0.07 Math::Round ok 0.08 Math::VecStat ok 1.03_01 Memoize ok 1.97 POSIX ok 1.30 Params::Validate ok 2.01 Pod::Usage ok 2.05 Readonly ok 2017060201 Regexp::Common ok 2.87 SVG ok 1.19 Set::IntSpan ok 1.6611 Statistics::Basic ok 3.23 Storable ok 1.23 Sys::Hostname ok 2.04 Text::Balanced ok 0.62 Text::Format ok 1.9767 Time::HiRes
Successfully generated test images!
Tidied up a bunch of things in the SBATCH script for geoduck RNAseq HiSat alignments in order to get it to run properly. Currently waiting in queue…
Updated (circos_pgen_karyotype.sh)[https://github.com/RobertsLab/sams-notebook/blob/master/bash_scripts/circos_pgen_karyotype.sh] bash script to allow passing arguments for species abbreviation and FastA index file.
Finished (I hope!) SBATCH script for geoduck RNAseq HiSat alignments for eventual lncRNA id. Lots of ins and outs and what-have-yous for this thing… Will execute tomorrow when Mox is back online (offline today for maintenance).
Oyster gene expression (ceabigr) meeting with Steven & Yaamini.
Need to make some CIRCOS plots
Continued work on SBATCH script for geoduck RNAseq HiSat alignments for eventual lncRNA id.
Started working on SBATCH script for geoduck RNAseq HiSat alignments for eventual lncRNA id.
transferred data to Mox, prepared SBATCH script and trimmed geoduck RNAseq data in preparation for identifying long non-coding RNA
Computer maintenance: upgraded my laptop Ubuntu virtual machine from 20.04LTS to 22.04LTS.
ProCard reconcilation.
transferred geoduck RNAseq files to Mox in preparation for long non-coding RNA identification.
Computer maintenance: backed up my laptop virtual machine (took a very long time) to my home server.
Helped with Zach’s GitHub Issue regarding PCR caps vs. films.
Helped with Marta’s GitHub Issue with running a support script in
.ProCard reconciliation.
Technical issue with accessing the Lab Safety Report Dashboard. Resolved late this afternoon.
- Continued to mess around with Mixomics R package tutorials. Here’s R Markdown used, as well as some of the plots that were produced (PCA and sparse PCA). Overall, sex is driving factor of differences in gene expression (FPKM) and average gene methylation.
Put together a quick R Markdown doc to go through the tutorial using our gene FPKM expression data and average gene methylation data (generated by Steven):
Some screenshots of plots generated: