![]() | Name | Last modified | Size | Description |
---|---|---|---|---|
![]() | Parent Directory | - | ||
![]() | ConTra.Rproj | 2025-08-23 19:25 | 205 | |
![]() | LICENSE | 2025-08-16 17:37 | 1.0K | |
![]() | bu.sh | 2025-08-23 19:43 | 125 | |
![]() | code/ | 2025-08-23 19:30 | - | |
![]() | data/ | 2025-08-16 17:37 | - | |
![]() | docs/ | 2025-08-23 19:25 | - | |
![]() | output/ | 2025-08-23 19:36 | - | |
ConTra is a high-performance Python framework for identifying context-dependent regulatory interactions in multi-omics data. It leverages parallel processing, vectorized operations, and memory-efficient algorithms to analyze complex biological regulatory networks.
full
and subset
modes--n-jobs
or
CONTRA_MAX_CORES
)git clone https://github.com/sr320/ConTra.git
cd ConTra
pip install -r code/requirements.txt
or
python3 -m pip install -r code/requirements.txt
Run (interactive prompt will ask for mode, default = full):
python3 code/context_dependent_analysis.py
Run explicitly in subset (faster dev/test) mode using 8 workers:
python3 code/context_dependent_analysis.py --mode subset --n-jobs 8
Run full analysis using all detected cores:
python3 code/context_dependent_analysis.py --mode full
Arguments:
--mode {full,subset}
Select analysis breadth--n-jobs N
Override auto CPU core detectionBy default the pipeline uses all available CPU cores reported by Python.
Ways to control cores:
Method | Example | Notes |
---|---|---|
Command-line flag | --n-jobs 32 |
Explicitly sets worker count |
Environment variable | CONTRA_MAX_CORES=64 |
Upper cap when --n-jobs not provided |
Both provided | CONTRA_MAX_CORES=64 --n-jobs 80 |
Flag wins (uses 80 if system has β₯80 cores) |
Examples:
# Cap to 64 cores via environment variable
CONTRA_MAX_CORES=64 python3 code/context_dependent_analysis.py --mode full
# Explicitly use 32 cores (ignores CONTRA_MAX_CORES)
python3 code/context_dependent_analysis.py --mode subset --n-jobs 32
You can obtain empirical p-values for top correlations using on-demand, early-stopping permutations. Disabled by default to keep the default run fast.
Flags:
--perm
Enable adaptive permutation testing--perm-min INT
Minimum permutations per tested
correlation (default: 1000)--perm-max INT
Maximum permutations (default:
100000)--perm-alpha FLOAT
Target tail probability precision
(default: 0.001). Stops early when achievable precision reached or max
permutations hit.How it works (per correlation sign):
perm_alpha/2
Interpretation:
Example (subset mode, 16 workers, enable permutations):
python3 code/context_dependent_analysis.py --mode subset --n-jobs 16 --perm --perm-min 2000 --perm-max 50000 --perm-alpha 0.0005
Result columns (where applicable) gain empirical_p
alongside existing statistical metrics.
Performance tips:
--perm-min
(e.g.Β 1000) to gauge
runtime--perm-max
only if you need finer p-value
resolution (<1e-4)--perm-alpha
increases runtime; loosening
speeds it upOutputs (per run) are written to:
output/context_dependent_analysis_<mode>_<YYYYMMDD_HHMMSS>/
plots/ *.png
tables/ *.csv
reports/ *.md, *.html
Key tables:
methylation_mirna_context.csv
lncrna_mirna_context.csv
multi_way_interactions.csv
*gene_*_correlations.csv
(context-specific
networks)Reports:
context_dependent_analysis_report.md
(full)
or subset_context_dependent_analysis_report.md
Mode | Genes (pairwise) | Genes (multi-way) | Genes (networks) | miRNA top/use | Methylation top/use | lncRNA top/use | Multi-way regulators (miRNA / lncRNA / methylation) | Seed |
---|---|---|---|---|---|---|---|---|
full | all | all | all | 25 / 10 | 50 / 15 | 50 / 15 | 15 / 30 / 25 | none |
subset | 500 | 200 | 200 | 10 / 5 | 10 / 5 | 10 / 5 | 5 / 7 / 5 | 42 |
Subset mode greatly reduces runtime and file sizes while preserving pipeline logic (useful for method development / CI tests).
The repository provides pre-cleaned and standardized multi-omics datasets ready for immediate analysis. All datasets are in CSV format with consistent sample alignment:
Dataset | Features | Samples | Sparsity | Description |
---|---|---|---|---|
gene_counts_cleaned.csv |
36,084 | 40 | 37.8% | Gene expression counts |
lncrna_counts_cleaned.csv |
15,900 | 40 | 3.8% | Long non-coding RNA expression counts |
mirna_counts_cleaned.csv |
51 | 40 | 7.8% | MicroRNA expression counts |
wgbs_counts_cleaned.csv |
249 | 40 | 41.4% | WGBS CpG methylation counts |
All datasets contain the same 40 samples representing different time points (TP1-TP4) across 10 different conditions:
data/cleaned_datasets/
βββ gene_counts_cleaned.csv # Main gene expression dataset
βββ lncrna_counts_cleaned.csv # Main lncRNA expression dataset
βββ mirna_counts_cleaned.csv # Main miRNA expression dataset
βββ wgbs_counts_cleaned.csv # Main DNA methylation dataset
βββ *_summary.txt # Individual dataset statistics
βββ combined_summary.txt # Overall dataset summary
βββ README.md # Detailed data documentation
These datasets are immediately usable for: - Multi-omics correlation analysis - Time series analysis across TP1-TP4 time points - Context-dependent regulatory network inference - Statistical modeling and machine learning workflows
ConTra employs several sophisticated approaches to identify context-dependent regulatory interactions:
ConTra/
βββ code/
β βββ context_dependent_analysis.py # Unified full + subset analysis script
β βββ requirements.txt # Python dependencies
βββ data/
β βββ cleaned_datasets/ # Input data files
βββ output/ # Generated results (created at runtime)
βββ LICENSE # MIT License
βββ README.md # This file
Deprecated: The previous
subset_context_dependent_analysis.py
has been mergedβuse
the unified script with --mode subset
.
We welcome contributions from the community! Whether youβre a bioinformatician, data scientist, or developer, there are many ways to contribute:
git checkout -b feature/amazing-feature
)git commit -m 'Add amazing feature'
)git push origin feature/amazing-feature
)This project is licensed under the MIT License - see the LICENSE file for details.
β Star this repository if you find it useful!
π€ Contributions are always welcome and appreciated!