============================================================ Present-Absent Variation (PAV) Analysis ============================================================ Samples: bc2041, bc2069, bc2070, bc2068, bc2071, bc2073, bc2072, bc2096 Output directory: /home/shared/16TB_HDD_01/sr320/github/project-lake-trout/analyses/11-pav Reference genome: /home/shared/16TB_HDD_01/sr320/github/project-lake-trout/data/GCF_016432855.1_SaNama_1.0_genomic.fa ============================================================ Step 1: Setting up reference genome files ============================================================ Reference index already exists. Running: Creating chromosome sizes file... Chromosome sizes saved to: /home/shared/16TB_HDD_01/sr320/github/project-lake-trout/analyses/11-pav/genome.chrom.sizes ============================================================ Step 2: Calculating coverage with bedtools ============================================================ Running: Creating genome windows... Created genome windows: /home/shared/16TB_HDD_01/sr320/github/project-lake-trout/analyses/11-pav/genome_windows.bed Processing bc2041... Running: bedtools coverage for bc2041... Running: Compressing coverage file for bc2041... Processing bc2069... Running: bedtools coverage for bc2069... Running: Compressing coverage file for bc2069... Processing bc2070... Running: bedtools coverage for bc2070... Running: Compressing coverage file for bc2070... Processing bc2068... Running: bedtools coverage for bc2068... Running: Compressing coverage file for bc2068... Processing bc2071... Running: bedtools coverage for bc2071... Running: Compressing coverage file for bc2071... Processing bc2073... Running: bedtools coverage for bc2073... Running: Compressing coverage file for bc2073... Processing bc2072... Running: bedtools coverage for bc2072... Running: Compressing coverage file for bc2072... Processing bc2096... Running: bedtools coverage for bc2096... Running: Compressing coverage file for bc2096... Coverage calculation complete for all samples. ============================================================ Step 3: Identifying absent regions (zero coverage) ============================================================ Running: Merging absent regions for bc2041... bc2041: 12171 absent regions Running: Merging absent regions for bc2069... bc2069: 39439 absent regions Running: Merging absent regions for bc2070... bc2070: 15040 absent regions Running: Merging absent regions for bc2068... bc2068: 23073 absent regions Running: Merging absent regions for bc2071... bc2071: 13856 absent regions Running: Merging absent regions for bc2073... bc2073: 12333 absent regions Running: Merging absent regions for bc2072... bc2072: 20837 absent regions Running: Merging absent regions for bc2096... bc2096: 8302 absent regions ============================================================ Step 4: Identifying present regions (with coverage) ============================================================ Running: Merging present regions for bc2041... bc2041: 18705 present regions Running: Merging present regions for bc2069... bc2069: 59388 present regions Running: Merging present regions for bc2070... bc2070: 24402 present regions Running: Merging present regions for bc2068... bc2068: 45243 present regions Running: Merging present regions for bc2071... bc2071: 22203 present regions Running: Merging present regions for bc2073... bc2073: 19251 present regions Running: Merging present regions for bc2072... bc2072: 33458 present regions Running: Merging present regions for bc2096... bc2096: 13525 present regions ============================================================ Step 5: Extracting insertions and deletions from BAM files ============================================================ Extracting indels from bc2041... Processed 100000 reads... Processed 200000 reads... Processed 300000 reads... Processed 400000 reads... Processed 500000 reads... Processed 600000 reads... Processed 700000 reads... Processed 800000 reads... Processed 900000 reads... Processed 1000000 reads... Processed 1100000 reads... Processed 1200000 reads... Processed 1300000 reads... Processed 1400000 reads... Processed 1500000 reads... Processed 1600000 reads... Processed 1700000 reads... Processed 1800000 reads... Processed 1900000 reads... Processed 2000000 reads... Processed 2100000 reads... Processed 2200000 reads... Processed 2300000 reads... Processed 2400000 reads... Processed 2500000 reads... Processed 2600000 reads... Processed 2700000 reads... Processed 2800000 reads... Processed 2900000 reads... Processed 3000000 reads... Processed 3100000 reads... Processed 3200000 reads... Processed 3300000 reads... Processed 3400000 reads... Processed 3500000 reads... Processed 3600000 reads... Processed 3700000 reads... Processed 3800000 reads... Processed 3900000 reads... Processed 4000000 reads... Processed 4100000 reads... Processed 4200000 reads... Processed 4300000 reads... Processed 4400000 reads... Running: Merging insertions for bc2041... Running: Merging deletions for bc2041... bc2041: 871505 insertions, 299138 deletions Extracting indels from bc2069... Processed 100000 reads... Processed 200000 reads... Processed 300000 reads... Processed 400000 reads... Processed 500000 reads... Processed 600000 reads... Processed 700000 reads... Processed 800000 reads... Processed 900000 reads... Processed 1000000 reads... Processed 1100000 reads... Processed 1200000 reads... Processed 1300000 reads... Processed 1400000 reads... Processed 1500000 reads... Processed 1600000 reads... Processed 1700000 reads... Running: Merging insertions for bc2069... Running: Merging deletions for bc2069... bc2069: 340623 insertions, 216897 deletions Extracting indels from bc2070... Processed 100000 reads... Processed 200000 reads... Processed 300000 reads... Processed 400000 reads... Processed 500000 reads... Processed 600000 reads... Processed 700000 reads... Processed 800000 reads... Processed 900000 reads... Processed 1000000 reads... Processed 1100000 reads... Processed 1200000 reads... Processed 1300000 reads... Processed 1400000 reads... Processed 1500000 reads... Processed 1600000 reads... Processed 1700000 reads... Processed 1800000 reads... Processed 1900000 reads... Processed 2000000 reads... Processed 2100000 reads... Processed 2200000 reads... Processed 2300000 reads... Processed 2400000 reads... Processed 2500000 reads... Processed 2600000 reads... Processed 2700000 reads... Processed 2800000 reads... Processed 2900000 reads... Processed 3000000 reads... Processed 3100000 reads... Processed 3200000 reads... Processed 3300000 reads... Processed 3400000 reads... Processed 3500000 reads... Processed 3600000 reads... Processed 3700000 reads... Running: Merging insertions for bc2070... Running: Merging deletions for bc2070... bc2070: 538102 insertions, 283745 deletions Extracting indels from bc2068... Processed 100000 reads... Processed 200000 reads... Processed 300000 reads... Processed 400000 reads... Processed 500000 reads... Processed 600000 reads... Processed 700000 reads... Processed 800000 reads... Processed 900000 reads... Processed 1000000 reads... Processed 1100000 reads... Processed 1200000 reads... Processed 1300000 reads... Processed 1400000 reads... Processed 1500000 reads... Processed 1600000 reads... Processed 1700000 reads... Processed 1800000 reads... Processed 1900000 reads... Processed 2000000 reads... Processed 2100000 reads... Processed 2200000 reads... Processed 2300000 reads... Processed 2400000 reads... Processed 2500000 reads... Processed 2600000 reads... Processed 2700000 reads... Processed 2800000 reads... Processed 2900000 reads... Processed 3000000 reads... Processed 3100000 reads... Processed 3200000 reads... Processed 3300000 reads... Processed 3400000 reads... Processed 3500000 reads... Processed 3600000 reads... Processed 3700000 reads... Processed 3800000 reads... Running: Merging insertions for bc2068... Running: Merging deletions for bc2068... bc2068: 331986 insertions, 261616 deletions Extracting indels from bc2071... Processed 100000 reads... Processed 200000 reads... Processed 300000 reads... Processed 400000 reads... Processed 500000 reads... Processed 600000 reads... Processed 700000 reads... Processed 800000 reads... Processed 900000 reads... Processed 1000000 reads... Processed 1100000 reads... Processed 1200000 reads... Processed 1300000 reads... Processed 1400000 reads... Processed 1500000 reads... Processed 1600000 reads... Processed 1700000 reads... Processed 1800000 reads... Processed 1900000 reads... Processed 2000000 reads... Processed 2100000 reads... Processed 2200000 reads... Processed 2300000 reads... Processed 2400000 reads... Processed 2500000 reads... Processed 2600000 reads... Processed 2700000 reads... Processed 2800000 reads... Processed 2900000 reads... Processed 3000000 reads... Processed 3100000 reads... Processed 3200000 reads... Processed 3300000 reads... Processed 3400000 reads... Processed 3500000 reads... Processed 3600000 reads... Processed 3700000 reads... Processed 3800000 reads... Processed 3900000 reads... Processed 4000000 reads... Processed 4100000 reads... Processed 4200000 reads... Running: Merging insertions for bc2071... Running: Merging deletions for bc2071... bc2071: 574215 insertions, 297017 deletions Extracting indels from bc2073... Processed 100000 reads... Processed 200000 reads... Processed 300000 reads... Processed 400000 reads... Processed 500000 reads... Processed 600000 reads... Processed 700000 reads... Processed 800000 reads... Processed 900000 reads... Processed 1000000 reads... Processed 1100000 reads... Processed 1200000 reads... Processed 1300000 reads... Processed 1400000 reads... Processed 1500000 reads... Processed 1600000 reads... Processed 1700000 reads... Processed 1800000 reads... Processed 1900000 reads... Processed 2000000 reads... Processed 2100000 reads... Processed 2200000 reads... Processed 2300000 reads... Processed 2400000 reads... Processed 2500000 reads... Processed 2600000 reads... Processed 2700000 reads... Processed 2800000 reads... Processed 2900000 reads... Processed 3000000 reads... Processed 3100000 reads... Processed 3200000 reads... Processed 3300000 reads... Processed 3400000 reads... Processed 3500000 reads... Processed 3600000 reads... Processed 3700000 reads... Processed 3800000 reads... Processed 3900000 reads... Processed 4000000 reads... Processed 4100000 reads... Processed 4200000 reads... Processed 4300000 reads... Processed 4400000 reads... Processed 4500000 reads... Processed 4600000 reads... Running: Merging insertions for bc2073... Running: Merging deletions for bc2073... bc2073: 652475 insertions, 307745 deletions Extracting indels from bc2072... Processed 100000 reads... Processed 200000 reads... Processed 300000 reads... Processed 400000 reads... Processed 500000 reads... Processed 600000 reads... Processed 700000 reads... Processed 800000 reads... Processed 900000 reads... Processed 1000000 reads... Processed 1100000 reads... Processed 1200000 reads... Processed 1300000 reads... Processed 1400000 reads... Processed 1500000 reads... Processed 1600000 reads... Processed 1700000 reads... Processed 1800000 reads... Processed 1900000 reads... Processed 2000000 reads... Processed 2100000 reads... Processed 2200000 reads... Processed 2300000 reads... Processed 2400000 reads... Processed 2500000 reads... Processed 2600000 reads... Processed 2700000 reads... Processed 2800000 reads... Processed 2900000 reads... Running: Merging insertions for bc2072... Running: Merging deletions for bc2072... bc2072: 469081 insertions, 266578 deletions Extracting indels from bc2096... Processed 100000 reads... Processed 200000 reads... Processed 300000 reads... Processed 400000 reads... Processed 500000 reads... Processed 600000 reads... Processed 700000 reads... Processed 800000 reads... Processed 900000 reads... Processed 1000000 reads... Processed 1100000 reads... Processed 1200000 reads... Processed 1300000 reads... Processed 1400000 reads... Processed 1500000 reads... Processed 1600000 reads... Processed 1700000 reads... Processed 1800000 reads... Processed 1900000 reads... Processed 2000000 reads... Processed 2100000 reads... Processed 2200000 reads... Processed 2300000 reads... Processed 2400000 reads... Processed 2500000 reads... Processed 2600000 reads... Processed 2700000 reads... Processed 2800000 reads... Processed 2900000 reads... Processed 3000000 reads... Processed 3100000 reads... Processed 3200000 reads... Processed 3300000 reads... Processed 3400000 reads... Processed 3500000 reads... Processed 3600000 reads... Processed 3700000 reads... Processed 3800000 reads... Processed 3900000 reads... Processed 4000000 reads... Processed 4100000 reads... Processed 4200000 reads... Processed 4300000 reads... Processed 4400000 reads... Processed 4500000 reads... Processed 4600000 reads... Processed 4700000 reads... Processed 4800000 reads... Processed 4900000 reads... Processed 5000000 reads... Processed 5100000 reads... Processed 5200000 reads... Processed 5300000 reads... Processed 5400000 reads... Processed 5500000 reads... Processed 5600000 reads... Processed 5700000 reads... Processed 5800000 reads... Processed 5900000 reads... Processed 6000000 reads... Processed 6100000 reads... Processed 6200000 reads... Processed 6300000 reads... Processed 6400000 reads... Processed 6500000 reads... Processed 6600000 reads... Processed 6700000 reads... Processed 6800000 reads... Processed 6900000 reads... Processed 7000000 reads... Processed 7100000 reads... Processed 7200000 reads... Processed 7300000 reads... Processed 7400000 reads... Processed 7500000 reads... Processed 7600000 reads... Running: Merging insertions for bc2096... Running: Merging deletions for bc2096... bc2096: 1037158 insertions, 346973 deletions ============================================================ Step 6: Creating combined PAV feature files ============================================================ bc2041: 1182814 total PAV features bc2069: 596959 total PAV features bc2070: 836887 total PAV features bc2068: 616675 total PAV features bc2071: 885088 total PAV features bc2073: 972553 total PAV features bc2072: 756496 total PAV features bc2096: 1392433 total PAV features ============================================================ Step 7: Intersecting PAV features with gene annotations ============================================================ Running: Intersecting absent regions with genes for bc2041... Running: Intersecting deletions with genes for bc2041... Running: Intersecting insertions with genes for bc2041... bc2041 genes affected: - absent: 3827 - deletions: 96950 - insertions: 182303 Running: Intersecting absent regions with genes for bc2069... Running: Intersecting deletions with genes for bc2069... Running: Intersecting insertions with genes for bc2069... bc2069 genes affected: - absent: 16891 - deletions: 68872 - insertions: 83965 Running: Intersecting absent regions with genes for bc2070... Running: Intersecting deletions with genes for bc2070... Running: Intersecting insertions with genes for bc2070... bc2070 genes affected: - absent: 4980 - deletions: 91814 - insertions: 131585 Running: Intersecting absent regions with genes for bc2068... Running: Intersecting deletions with genes for bc2068... Running: Intersecting insertions with genes for bc2068... bc2068 genes affected: - absent: 8382 - deletions: 85772 - insertions: 89539 Running: Intersecting absent regions with genes for bc2071... Running: Intersecting deletions with genes for bc2071... Running: Intersecting insertions with genes for bc2071... bc2071 genes affected: - absent: 4338 - deletions: 97553 - insertions: 142150 Running: Intersecting absent regions with genes for bc2073... Running: Intersecting deletions with genes for bc2073... Running: Intersecting insertions with genes for bc2073... bc2073 genes affected: - absent: 3827 - deletions: 100512 - insertions: 161865 Running: Intersecting absent regions with genes for bc2072... Running: Intersecting deletions with genes for bc2072... Running: Intersecting insertions with genes for bc2072... bc2072 genes affected: - absent: 7575 - deletions: 86301 - insertions: 118415 Running: Intersecting absent regions with genes for bc2096... Running: Intersecting deletions with genes for bc2096... Running: Intersecting insertions with genes for bc2096... bc2096 genes affected: - absent: 2165 - deletions: 115615 - insertions: 235718 ============================================================ Step 8: Generating summary statistics ============================================================ Summary Statistics: sample n_absent_regions absent_total_bp n_deletions deletion_total_bp n_insertions bc2041 12171 48321386 299138 117803687 871505 bc2069 39439 152882737 216897 73086849 340623 bc2070 15040 56549211 283745 100641753 538102 bc2068 23073 74469525 261616 77132396 331986 bc2071 13856 53025388 297017 107389611 574215 bc2073 12333 48837863 307745 115150161 652475 bc2072 20837 77419750 266578 92452663 469081 bc2096 8302 32230697 346973 140441338 1037158 Summary saved to: /home/shared/16TB_HDD_01/sr320/github/project-lake-trout/analyses/11-pav/pav_summary_stats.csv ============================================================ Step 9: Comparing PAV across samples ============================================================ Running: Running bedtools multiinter... Running: Filtering regions absent in all samples... Running: Filtering sample-specific absent regions... Regions absent in all samples: 2572 Regions absent in only one sample: 72677 ============================================================ PAV Analysis Complete! ============================================================ Output files are in: /home/shared/16TB_HDD_01/sr320/github/project-lake-trout/analyses/11-pav Key output files per sample: - {sample}.pav_features.bed : Combined PAV features - {sample}.absent_regions_merged.bed : Zero-coverage regions - {sample}.insertions.bed : Insertions ≥50bp - {sample}.deletions.bed : Deletions ≥50bp - {sample}.genes_in_absent_regions.bed : Genes in absent regions Cross-sample files: - pav_summary_stats.csv : Summary statistics - absent_regions_multiinter.bed : Multi-sample comparison - absent_in_all_samples.bed : Regions absent in all samples - absent_in_one_sample.bed : Sample-specific absent regions