{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 5: Analyze final Barnacle model\n", "\n", "Use this notebook to compile and analyze the final version of your Barnacle model. This should be the version of the model that is fit with the optimal parameters you identified in step 4. There are several parts of this compilation and analysis notebook:\n", "1. Align the components between bootstraps of your final model.\n", " - The order of components is not fixed in this tensor decomposition model. Therefore, in order to compare between bootstraps, the components must first be aligned to one another.\n", " - The aligned bootstraps will be saved as an xarray.DataSet so that you can access them for further analysis\n", "1. Summarize the model weights for each component.\n", " - Each component can be understood to model a different pattern in the data. Depending on how you set up your data and your Barnacle model, each pattern might also be associated with a different cluster (e.g. gene clusters). This step separates out each component so you can more closely examine the pattern and/or cluster each is modeling.\n", "1. Visualize your model.\n", " - Effective visualization depends on your data type, size, dimensions, and the questions you are asking. A few potential visualizations are suggested below to help get you started." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# imports\n", "\n", "import itertools\n", "import numpy as np\n", "import os\n", "import pandas as pd\n", "import seaborn as sns\n", "import tensorly as tl\n", "import tlviz\n", "import xarray as xr\n", "\n", "from barnacle.tensors import SparseCPTensor\n", "from barnacle.utils import subset_cp_tensor\n", "from functools import reduce\n", "from matplotlib import pyplot as plt\n", "from tlab.cp_tensor import load_cp_tensor\n", "from tqdm.notebook import tqdm\n", "\n", "# set color palette\n", "sns.set_palette(sns.color_palette([\n", " '#9B5DE5', '#FFAC69', '#00C9AE', '#FD3F92', '#0F0A0A', '#959AB1', '#FFDB66', '#FFB1CA', '#63B9FF', '#4F1DD7'\n", "]))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part A: Align model bootstraps" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "File outputs of this notebook will be saved here: data/barnacle/model\n" ] } ], "source": [ "# USER INPUTS -- edit these variables as needed\n", "\n", "# path to directory where the outputs from your parameter search were saved (e.g. 'directory/barnacle/fitting/')\n", "fitpath = 'data/barnacle/fitting'\n", "\n", "# path to the normalized data tensor used to fit barnacle (e.g. 'directory/data-tensor.nc')\n", "datapath = 'data/data-tensor.nc'\n", "\n", "# optimal rank parameter (number of components) used to fit your final model\n", "optimal_rank = 5\n", "\n", "# optimal lambda parameter (sparsity coefficient) used to fit your final model\n", "optimal_lambda = 1.0\n", "\n", "# number of bootstraps used for final model\n", "n_bootstraps = 100\n", "\n", "# output directory where files produced by this notebook will be saved\n", "outdir = fitpath.strip('fitting') + 'model'\n", "if not os.path.isdir(outdir):\n", " os.makedirs(outdir)\n", "print(f\"File outputs of this notebook will be saved here: {outdir}\")\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "de6d09c83e9644ccb0e1e4cb7bdbcbc0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Extracting sample names: 0%| | 0/100 [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "00bb5429919643ffa353bffa544ffb0d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Importing model bootstraps: 0%| | 0/100 [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully imported 100 model bootstraps, each with 3 replicates.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "33e320463bbc4601af721c7d4ad3d79c", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Identifying best reference model from bootstraps: 0%| | 0/300 [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "All bootstraps will be aligned to the following reference model:\n" ] }, { "data": { "text/html": [ "
\n", " | reference_bootstrap | \n", "reference_replicate | \n", "mean_fms | \n", "median_fms | \n", "
---|---|---|---|---|
0 | \n", "97 | \n", "C | \n", "0.644711 | \n", "0.65551 | \n", "
<xarray.Dataset> Size: 243MB\n", "Dimensions: (bootstrap: 100, replicate: 3, component: 5,\n", " KOfam: 20069, phylum: 99, sample: 11)\n", "Coordinates:\n", " * bootstrap (bootstrap) int64 800B 0 1 2 3 4 5 ... 94 95 96 97 98 99\n", " * replicate (replicate) object 24B 'A' 'B' 'C'\n", " * component (component) int64 40B 1 2 3 4 5\n", " * KOfam (KOfam) <U6 482kB 'K00001' 'K00002' ... 'K26180' 'K26182'\n", " * phylum (phylum) <U30 12kB 'Acidobacteriota' ... 'Xanthophyceae'\n", " * sample (sample) object 88B 'G3.UW.ALL.L25S1' ... 'G3.UW.ALL.L...\n", "Data variables:\n", " KOfam_weights (bootstrap, replicate, component, KOfam) float64 241MB ...\n", " phylum_weights (bootstrap, replicate, component, phylum) float64 1MB ...\n", " sample_weights (bootstrap, replicate, component, sample) float64 132kB ...\n", " component_weights (bootstrap, replicate, component) float64 12kB 160.3 ....\n", " percent_variation (bootstrap, replicate, component) float64 12kB 0.4193 ...\n", " fms_component (bootstrap, replicate, component) float64 12kB 0.6783 ...\n", "Attributes:\n", " rank: 5\n", " lambda: 1.0\n", " n_bootstraps: 100\n", " align_ref_bootstrap: 97\n", " align_ref_replicate: C