# 1SE Rule for Rank and Lambda Selection ## Problem: SSE Decreases with Increasing Rank **Observation:** SSE monotonically decreases as rank increases - this is **expected behavior** but makes selecting the "optimal" rank difficult. **Why this happens:** - Higher rank = more parameters = better fit to training data - SSE will always be lowest at highest rank tested - This doesn't mean highest rank is best (risk of overfitting) **The Challenge:** How do we select a rank that balances: 1. **Model fit** (low SSE) 2. **Model complexity** (low rank) 3. **Generalization** (stable across train/test splits) ## Solution: The 1SE Rule The **"one standard error rule"** is a parsimony principle from statistical learning (Breiman et al., 1984; Hastie et al., 2009): > *"Select the simplest model whose performance is within one standard error of the best model"* ### Applied to Rank Selection **Step 1:** Find minimum mean SSE across bootstrap iterations ``` Rank 5: SSE = 1.50e8 ± 0.10e8 Rank 10: SSE = 1.20e8 ± 0.08e8 Rank 15: SSE = 1.10e8 ± 0.07e8 ← Minimum Rank 20: SSE = 1.05e8 ± 0.09e8 Rank 25: SSE = 1.03e8 ± 0.12e8 Rank 30: SSE = 1.01e8 ± 0.15e8 ← Lowest, but higher uncertainty ``` **Step 2:** Calculate 1SE threshold ``` Min SSE = 1.10e8 (at rank 15) SE = 0.07e8 1SE Threshold = 1.10e8 + 0.07e8 = 1.17e8 ``` **Step 3:** Find all ranks within threshold ``` Rank 10: 1.20e8 > 1.17e8 ✗ Above threshold Rank 15: 1.10e8 ≤ 1.17e8 ✓ Within 1SE Rank 20: 1.05e8 ≤ 1.17e8 ✓ Within 1SE Rank 25: 1.03e8 ≤ 1.17e8 ✓ Within 1SE Rank 30: 1.01e8 ≤ 1.17e8 ✓ Within 1SE ``` **Step 4:** Select smallest rank within threshold ``` ✓ Selected Rank = 15 (most parsimonious within 1SE) ``` ### Why This Works 1. **Statistical justification:** Models within 1SE are not significantly different in performance 2. **Parsimony:** Prefer simpler (lower rank) when performance is comparable 3. **Generalization:** Reduces overfitting risk 4. **Stability:** Accounts for uncertainty in cross-validation estimates ### Example Scenarios #### Scenario A: Clear Winner (Rare) ``` Rank 10: 1.50e8 ± 0.05e8 Rank 15: 1.10e8 ± 0.04e8 ← Min (1SE = 1.14e8) Rank 20: 1.25e8 ± 0.06e8 ✗ Above threshold → Select Rank 15 (unambiguous best) ``` #### Scenario B: Monotonic Decrease (Common in Your Data) ``` Rank 5: 1.50e8 ± 0.10e8 ✗ Above threshold Rank 10: 1.30e8 ± 0.08e8 ✗ Above threshold Rank 15: 1.15e8 ± 0.07e8 ← Min (1SE = 1.22e8) Rank 20: 1.10e8 ± 0.09e8 ✓ Within 1SE Rank 25: 1.08e8 ± 0.11e8 ✓ Within 1SE Rank 30: 1.06e8 ± 0.14e8 ✓ Within 1SE → Select Rank 15 (smallest within 1SE, despite not having absolute minimum) ``` #### Scenario C: High Variance at High Ranks ``` Rank 10: 1.20e8 ± 0.05e8 Rank 15: 1.10e8 ± 0.06e8 ← Min (1SE = 1.16e8) Rank 20: 1.08e8 ± 0.12e8 ✓ Within 1SE (but high variance) Rank 25: 1.05e8 ± 0.18e8 ✓ Within 1SE (very high variance) → Select Rank 15 (both parsimonious AND more stable) ``` ## Applied to Lambda Selection **Same principle** applied to regularization parameter: **Step 1:** At optimal rank, find maximum mean FMS ``` λ=0.0: FMS = 0.85 ± 0.05 ← Maximum λ=0.01: FMS = 0.84 ± 0.04 λ=0.05: FMS = 0.82 ± 0.04 λ=0.1: FMS = 0.78 ± 0.06 λ=0.5: FMS = 0.65 ± 0.08 ``` **Step 2:** Calculate 1SE threshold ``` Max FMS = 0.85 SE = 0.05 1SE Threshold = 0.85 - 0.05 = 0.80 ``` **Step 3:** Find all lambdas within threshold ``` λ=0.0: 0.85 ≥ 0.80 ✓ λ=0.01: 0.84 ≥ 0.80 ✓ λ=0.05: 0.82 ≥ 0.80 ✓ λ=0.1: 0.78 < 0.80 ✗ ``` **Step 4:** Select maximum lambda within threshold ``` ✓ Selected λ = 0.05 (most sparse while maintaining factor stability) ``` ### Why This Works for Lambda 1. **Sparsity goal:** Higher λ → more zeros in factors 2. **Stability requirement:** FMS measures factor consistency 3. **Trade-off:** Maximum sparsity without sacrificing factor quality 4. **Interpretation:** Easier to interpret factors with fewer non-zero elements ## Visualization Updates The updated rank selection plot now shows: 1. **Mean SSE with error bars** (bootstrap uncertainty) 2. **1SE threshold line** (horizontal orange dashed line) 3. **Minimum SSE rank** (vertical blue dotted line) 4. **Selected rank by 1SE rule** (vertical red dashed line) 5. **Green circles** around ranks within 1SE 6. **Coefficient of variation panel** (shows stability) This makes it visually clear: - Which ranks are statistically comparable - Why a lower rank was selected despite higher ranks having lower SSE - How stable each rank is across bootstrap iterations ## Theoretical Background ### Statistical Learning Theory The 1SE rule comes from **cross-validation theory** (Breiman et al., 1984): **Key insight:** Cross-validation estimates have uncertainty. Models whose CV scores differ by less than 1 SE are not significantly different. **Mathematical formulation:** ``` Model A is selected over Model B if: 1. CV_score(A) is within 1×SE(best_score) of best_score 2. Complexity(A) < Complexity(B) ``` ### References 1. **Breiman, L., Friedman, J., Stone, C.J., & Olshen, R.A. (1984).** *Classification and Regression Trees.* Wadsworth. (Original CART algorithm with 1SE rule) 2. **Hastie, T., Tibshirani, R., & Friedman, J. (2009).** *The Elements of Statistical Learning* (2nd ed.). Springer. (Section 7.10: Cross-Validation) 3. **James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013).** *An Introduction to Statistical Learning.* Springer. (Section 5.1: Cross-Validation) ### Parsimony Principle **Occam's Razor in Statistics:** > "When multiple models explain the data equally well, prefer the simplest" **Applied here:** - Rank = model complexity (more factors = more complex) - Lambda = feature complexity (lower λ = dense factors) - 1SE rule = operational definition of "equally well" ## Implementation Details ### Code Changes **Before (minimum SSE):** ```python optimal_rank_idx = lambda_zero_df['mean_sse'].idxmin() optimal_rank = int(lambda_zero_df.loc[optimal_rank_idx, 'rank']) ``` **After (1SE rule):** ```python # Find minimum SSE min_sse_idx = lambda_zero_df['mean_sse'].idxmin() min_sse = lambda_zero_df.loc[min_sse_idx, 'mean_sse'] min_sse_se = lambda_zero_df.loc[min_sse_idx, 'se_sse'] # Calculate threshold sse_1se_threshold = min_sse + min_sse_se # Find smallest rank within threshold within_1se = lambda_zero_df[lambda_zero_df['mean_sse'] <= sse_1se_threshold] optimal_rank = int(within_1se.sort_values('rank').iloc[0]['rank']) ``` ### Features Maintained ✅ **Parallel execution** (`n_jobs=-1` for all cores) ✅ **Incremental checkpointing** (saves after each bootstrap iteration) ✅ **Resume capability** (loads from checkpoints) ✅ **Bootstrap stability assessment** (CV, convergence rates) ✅ **Comprehensive visualization** (4 diagnostic plots) ### New Output The terminal output now includes: ``` STAGE 1: RANK SELECTION (1SE RULE) Criterion: Smallest rank within 1SE of minimum SSE at λ=0.0 ================================================================================ Minimum SSE: 1.10e+08 ± 7.00e+06 (at rank=30) 1SE Threshold: 1.17e+08 Applying 1SE rule (parsimony principle): Select smallest rank where SSE ≤ 1.17e+08 ✓ OPTIMAL RANK (1SE rule): 15 Mean SSE: 1.15e+08 ± 7.00e+06 (smallest rank with SSE within 1SE of minimum) Note: Rank 30 had absolute minimum SSE, but rank 15 selected by 1SE rule (more parsimonious) Ranks within 1SE of minimum: rank mean_sse se_sse n_converged 15 1.15000e+08 7.00e+06 10 20 1.10000e+08 9.00e+06 10 25 1.08000e+08 1.10e+07 10 30 1.06000e+08 1.40e+07 10 ``` ## Practical Recommendations ### When to Use 1SE Rule ✅ **Use 1SE rule when:** - SSE monotonically decreases with rank - You want parsimonious models - High ranks show increased variance - Interpretability is important - Avoiding overfitting is priority ❌ **Consider alternatives when:** - Clear elbow point in SSE curve - Dramatic SSE improvement at specific rank - External validation shows optimal rank differs - Domain knowledge suggests specific rank ### Interpreting Results **If selected rank = minimum SSE rank:** - Unambiguous winner - 1SE rule confirms minimum SSE choice **If selected rank < minimum SSE rank:** - Parsimony principle applied - Selected rank is "good enough" but simpler - This is the intended behavior! **If most ranks within 1SE:** - High uncertainty in rank selection - Consider: more bootstrap iterations, different rank range, or examine data quality ### Troubleshooting **Problem:** All ranks within 1SE (selects rank 5) **Solution:** SSE differences too small relative to uncertainty. Either: 1. Increase bootstrap iterations (more precise SE estimates) 2. Expand rank range downward (test rank 3, 4) 3. Accept rank 5 as adequate **Problem:** No ranks within 1SE except minimum **Solution:** Clear winner found. 1SE rule = minimum SSE rule in this case. **Problem:** Very high SE at optimal rank **Solution:** Model unstable across train/test splits. Consider: 1. Different rank 2. Data preprocessing (outliers, normalization) 3. More bootstrap iterations to confirm ## Summary The **1SE rule** provides a principled, statistically justified method for selecting rank when SSE decreases monotonically: 1. **Addresses overfitting** by preferring simpler models 2. **Accounts for uncertainty** in bootstrap CV estimates 3. **Follows established precedent** from statistical learning literature 4. **Maintains all existing features** (parallelization, checkpointing, resume) 5. **Provides clear visualization** of selection rationale **Bottom line:** Instead of always selecting the highest rank (lowest SSE), the 1SE rule selects the **simplest model that performs comparably**, which is more likely to generalize to new data.