# Orthologous Protein Identification Across Three Coral Species ## Introduction Understanding the evolutionary relationships between genes across different species is fundamental to comparative genomics. In this analysis, we identified orthologous proteins across three distinct coral species using a reciprocal best hits (RBH) approach. This methodology provides a robust foundation for cross-species comparisons and evolutionary studies. ## Study Species Our analysis focused on three coral species representing different growth forms and ecological strategies: - **Acropora pulchra** (D-Apul): A fast-growing, branching coral species - **Porites evermanni** (E-Peve): A slow-growing, massive coral species - **Pocillopora tuahiniensis** (F-Ptua): An intermediate growth, branching coral species These species represent different evolutionary strategies within the coral phylogeny, making them ideal candidates for comparative genomic analysis. ## Methodology ### Orthology Identification Pipeline We employed a comprehensive orthology identification pipeline using the following parameters: - **BLAST algorithm**: All-vs-all protein BLAST comparisons - **Orthology criterion**: Reciprocal best hits (RBH) - **E-value threshold**: 1e-5 (stringent significance threshold) - **Minimum identity**: 30% (allowing for evolutionary divergence) - **Minimum coverage**: 50% (ensuring substantial sequence overlap) ### Computational Workflow 1. **Database Preparation**: Created BLAST databases for each species' protein sequences 2. **All-vs-All Comparisons**: Performed bidirectional BLAST searches between all species pairs 3. **Reciprocal Best Hits**: Identified protein pairs that are mutual best matches 4. **Ortholog Grouping**: Organized orthologs into groups based on presence across species 5. **Quality Filtering**: Applied identity and coverage thresholds to ensure orthology confidence ## Results ### Protein Sequence Statistics The analysis began with comprehensive protein datasets from each species: - **Acropora pulchra**: 15,664 total proteins from genome annotation - **Porites evermanni**: 16,693 total proteins from genome assembly - **Pocillopora tuahiniensis**: 16,060 total proteins from genome annotation These datasets represent the complete protein complement for each species, providing a comprehensive foundation for orthology analysis. ### Orthology Classification Our analysis identified several categories of orthologous relationships: 1. **Three-way orthologs**: 10,346 proteins present in all three species (highest confidence) 2. **Two-way orthologs**: 7,980 proteins shared between specific species pairs 3. **Total ortholog groups**: 18,326 distinct ortholog groups identified ### Pairwise Orthology Results The reciprocal best hits analysis revealed strong orthology relationships between species pairs: - **Acropora pulchra vs Porites evermanni**: 13,782 orthologous protein pairs - **Acropora pulchra vs Pocillopora tuahiniensis**: 13,320 orthologous protein pairs - **Porites evermanni vs Pocillopora tuahiniensis**: 14,303 orthologous protein pairs ### Key Findings The orthology analysis revealed substantial conservation across the three coral species: - **High conservation**: ~66% of Acropora pulchra proteins have orthologs in both other species - **Strong pairwise relationships**: Each species pair shares 80-86% of their protein complement - **Core gene set**: 10,346 proteins (three-way orthologs) represent the conserved ancestral gene set - **Lineage-specific genes**: ~7,980 proteins show species-specific orthology patterns, indicating potential lineage-specific adaptations ## Technical Implementation ### Computational Resources The analysis utilized: - **BLAST databases**: Custom-built for each species - **Parallel processing**: Multi-threaded BLAST searches for efficiency - **Quality control**: Multiple filtering steps to ensure orthology confidence ### Data Management Results were organized into: - **Ortholog groups**: Hierarchical classification of orthologous relationships - **Pairwise comparisons**: Detailed RBH results for each species pair - **Summary statistics**: Comprehensive overview of orthology patterns ### Output Files All analysis results are available in the [orthology analysis output directory](https://github.com/urol-e5/timeseries_molecular/tree/main/M-multi-species/output/11-orthology-analysis): **Core Results:** - [`ortholog_groups.csv`](https://github.com/urol-e5/timeseries_molecular/blob/main/M-multi-species/output/11-orthology-analysis/ortholog_groups.csv) - Complete ortholog group assignments (18,326 groups) - [`orthology_summary.csv`](https://github.com/urol-e5/timeseries_molecular/blob/main/M-multi-species/output/11-orthology-analysis/orthology_summary.csv) - Summary statistics and counts **Pairwise Comparisons:** - [`apul_peve_rbh.csv`](https://github.com/urol-e5/timeseries_molecular/blob/main/M-multi-species/output/11-orthology-analysis/apul_peve_rbh.csv) - Acropora pulchra vs Porites evermanni reciprocal best hits - [`apul_ptua_rbh.csv`](https://github.com/urol-e5/timeseries_molecular/blob/main/M-multi-species/output/11-orthology-analysis/apul_ptua_rbh.csv) - Acropora pulchra vs Pocillopora tuahiniensis reciprocal best hits - [`peve_ptua_rbh.csv`](https://github.com/urol-e5/timeseries_molecular/blob/main/M-multi-species/output/11-orthology-analysis/peve_ptua_rbh.csv) - Porites evermanni vs Pocillopora tuahiniensis reciprocal best hits **Raw BLAST Results:** - BLAST output files for all pairwise comparisons (Apul_vs_Peve.blastp, Peve_vs_Apul.blastp, etc.) - BLAST databases for each species (Apul_proteins.*, Peve_proteins.*, Ptua_proteins.*) ## Conclusions The identification of orthologous proteins across three coral species establishes a critical resource for comparative coral genomics. This analysis reveals both conserved and divergent aspects of coral biology, providing insights into the evolutionary processes that have shaped coral diversity. The orthology assignments generated here will serve as a reference for future studies investigating coral biology, evolution, and responses to environmental change. By understanding the genetic relationships between these species, we can better predict how different coral lineages may respond to changing environmental conditions. --- *This analysis represents a key step in our multi-species comparative genomics pipeline, providing the foundation for cross-species comparisons and evolutionary studies in coral biology.*