{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Aspen Coyle\n", "\n", "afcoyle@uw.edu\n", "\n", "Roberts Lab, UW-SAFS\n", "\n", "2021-05-05" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this script, we will merge together two kallisto libraries - cbai_transcriptomev4.0 and hemat_transcriptomev1.6. They contain only _C. bairdi_ and only _Hematodinium_ sequences, respectively. Information on their creation is available [on the Roberts Lab Genomic Resources page](https://robertslab.github.io/resources/Genomic-Resources/)\n", "\n", "This could be done more neatly and easily inside a for loop - when I'm cleaning this up, I'll go through and figure that whole process out" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Make new directory for kallisto libraries\n", "!mkdir ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Copy cbai_transcriptomev4.0 over to new folder\n", "!cp -r ../output/kallisto_libraries/cbai_transcriptomev4.0/. ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Remove all .h5 and .json files, since we won't be appending those\n", "# Also remove the checksums and std_errortracking file as neither apply\n", "!rm ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/*/*.h5\n", "!rm ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/*/*.json\n", "!rm ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/checksums.md5\n", "!rm ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/std_errortracking.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Appending Hemat Libraries to Cbai Libraries" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# ID 072\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id072/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id072/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 88303 ../output/kallisto_libraries/cbai_transcriptomev4.0/id072/abundance.tsv\n", " 1412255 ../output/kallisto_libraries/cbaihemat_transcriptomev2.0/id072/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id072/abundance.tsv\n", " 6177 ../output/kallisto_libraries/hemat_transcriptomev1.6/id072/abundance.tsv\n", " 1601215 total\n" ] } ], "source": [ "# Check by looking at number of lines - all good!\n", "! wc -l ../output/kallisto_libraries/*/id072/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# ID 118\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id118/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id118/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# ID 127\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id127/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id127/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "# ID 132\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id132/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id132/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# ID 151\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id151/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id151/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# ID 173\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id173/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id173/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# ID 178\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id178/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id178/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# ID 254\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id254/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id254/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "# ID 272\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id272/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id272/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "# ID 280\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id280/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id280/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "# ID 294\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id294/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id294/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# ID 334\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id334/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id334/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "# ID 349\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id349/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id349/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "# ID 359\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id359/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id359/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "# ID 445\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id445/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id445/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "# ID 463\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id463/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id463/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "# ID 481\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id481/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id481/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "# ID 485\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id485/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id485/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "# ID 380821\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id380821/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id380821/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "# ID 380823\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id380823/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id380823/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "# ID 380825\n", "!cat ../output/kallisto_libraries/hemat_transcriptomev1.6/id380825/abundance.tsv \\\n", ">> ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id380825/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id072/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id118/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id127/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id132/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id151/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id173/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id178/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id254/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id272/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id280/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id294/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id334/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id349/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id359/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id380821/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id380823/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id380825/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id445/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id463/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id481/abundance.tsv\n", " 94480 ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id485/abundance.tsv\n", " 1984080 total\n" ] } ], "source": [ "# Check word counts again. All should be the same - 94480 - since we're just combining libraries.\n", "# This doesn't mean all libraries contain the same counts - there'll be zero counts in there!\n", "!wc -l ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/id*/abundance.tsv" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "# Produce checksum file for new libraries\n", "!md5sum ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/*/* > ../output/kallisto_libraries/cbaiv4.0_hematv1.6_combined/checksums.md5" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }