End-to-End Comp Model Walkthrough

End-to-end model walkthrough — 30-minute interview answer

9 phases · Full ML pipeline · Legal risk · ROI output · Microsoft-level depth

Phase 1 — Problem framing and business question definition ~2 min

Before a single line of SQL is written, I define the business question precisely. The hiring manager is listening for whether you frame problems analytically or react to requests. Here's the framing for a comp plan redesign model:

Say this verbatim

"The business question is: if we change this comp plan, what will it cost, how will rep behavior change, and is the investment justified by the revenue it generates? That question has three sub-problems: a cost prediction problem, a behavioral prediction problem, and an ROI estimation problem. Each requires a different model. I build all three, connect them, and present a unified output. Let me walk you through how."

The three sub-problems and their model families

Sub-problem 1 — Cost prediction

Question: What will this plan cost under a range of attainment outcomes?
Model family: Multivariate regression + Monte Carlo simulation
Target variable (Y): Total incentive payout per rep
Output: P50/P90 cost distribution, scenario range

Sub-problem 2 — Behavioral prediction

Question: How will reps change their selling behavior in response to the new plan?
Model family: Difference-in-differences regression (natural experiment) or gradient boosted trees for segment-level elasticity
Target variable (Y): Attainment% change
Output: Behavioral elasticity by rep segment

Sub-problem 3 — ROI estimation

Question: Does the revenue generated exceed the cost of the plan change?
Model family: Cost-impact decomposition + sensitivity analysis
Target variable (Y): Net impact = ΔRevenue − ΔCost
Output: ROI ratio, break-even lift, cost of sales %

Why this framing wins interviews

Most candidates jump straight to "I'd build a regression model." Separating the problem into three sub-problems with different model families signals that you understand the analytical landscape — that cost prediction and behavioral prediction are fundamentally different problems requiring different methods. That's senior-level thinking.

Phase 2 — Data sourcing, schema design, and SQL query building ~4 min

Data sourcing is where many candidates wave their hands. Don't. Name the tables, describe the grain, explain the joins. At Microsoft this data lives across multiple systems — describe the integration pattern.

Data sources and schema

Source 1 — SAP Commissions (comp system)

Tables: rep_payout_history, plan_parameters, quota_assignments, attainment_monthly. Grain: one row per rep per measurement period. Contains actual payout, quota, attainment%, plan ID, and all curve parameters.

Source 2 — Dynamics 365 / CRM (sales activity)

Tables: opportunity_history, activity_log, pipeline_snapshot. Grain: one row per opportunity per stage-date. Contains deal size, stage velocity, win rate, product mix, customer segment. Critical for behavioral features.

Source 3 — HR / Workday (rep attributes)

Tables: employee_profile, territory_assignment, tenure_history. Grain: one row per employee per period. Contains tenure, role, segment, manager, quota-to-market-size ratio. Used for fairness testing and cohort analysis.

Source 4 — Market data (external)

Territory-level total addressable market (TAM), competitive presence, regional economic indicators. Joined to territory table. Critical for quota fairness testing and regression-based quota model. Source: Radford, Gartner, or internal market intelligence team.

The foundation query — building the analytical base table

-- Base table: one row per rep per fiscal quarter
-- Grain: rep_id × fiscal_quarter
-- Source: Snowflake (Microsoft internal DW on Azure Synapse / Fabric)

WITH rep_base AS (
  SELECT
    r.rep_id,
    r.fiscal_quarter,
    r.segment,                          -- Enterprise / SMC / Partner
    r.solution_area,                    -- Azure / M365 / Security
    r.quota_amount,
    r.actual_revenue,
    r.actual_revenue / r.quota_amount AS attainment_pct,
    r.plan_id,
    p.ti_amount,
    p.accel_rate,
    p.threshold_pct,
    p.cap_multiplier,
    p.base_salary,
    r.actual_revenue / r.quota_amount * p.ti_amount AS raw_incentive_earned

  FROM attainment_monthly r
  JOIN plan_parameters p ON r.plan_id = p.plan_id
                         AND r.fiscal_quarter BETWEEN p.effective_start AND p.effective_end
),

rep_attributes AS (
  SELECT
    e.rep_id,
    e.tenure_months,
    e.manager_id,
    e.territory_id,
    t.tam_usd,                          -- total addressable market
    t.current_penetration_pct,
    e.quota_amount / NULLIF(t.tam_usd, 0) AS quota_to_tam_ratio
  FROM employee_profile e
  JOIN territory_assignment t ON e.territory_id = t.territory_id
),

rep_behavior AS (
  SELECT
    rep_id,
    fiscal_quarter,
    COUNT(DISTINCT opportunity_id)      AS deal_count,
    AVG(deal_size_usd)                  AS avg_deal_size,
    SUM(CASE WHEN product_family = 'Azure'
             THEN deal_size_usd ELSE 0 END)
    / NULLIF(SUM(deal_size_usd), 0)     AS azure_mix_pct,
    AVG(days_to_close)                  AS avg_cycle_days,
    SUM(CASE WHEN close_quarter = fiscal_quarter
             THEN deal_size_usd ELSE 0 END)
    / NULLIF(SUM(deal_size_usd), 0)     AS q_end_concentration   -- sandbagging signal
  FROM opportunity_history
  GROUP BY rep_id, fiscal_quarter
)

SELECT
  b.*,
  a.tenure_months,
  a.tam_usd,
  a.quota_to_tam_ratio,
  a.current_penetration_pct,
  bh.deal_count,
  bh.avg_deal_size,
  bh.azure_mix_pct,
  bh.avg_cycle_days,
  bh.q_end_concentration
FROM rep_base b
LEFT JOIN rep_attributes a  ON b.rep_id = a.rep_id
LEFT JOIN rep_behavior bh   ON b.rep_id = bh.rep_id
                            AND b.fiscal_quarter = bh.fiscal_quarter
ORDER BY b.fiscal_quarter, b.rep_id;

Say this about the query

"I build one analytical base table at the grain of rep × quarter. This is the single source of truth for all downstream models — cost prediction, behavioral prediction, and fairness testing all join to this table. I use CTEs rather than nested subqueries for readability and auditability. The q_end_concentration column is a sandbagging signal — a high value means the rep is closing deals disproportionately in the last weeks of the quarter, which is a plan design response I want to measure and eventually explain."

Phase 3 — Feature engineering and variable construction ~4 min

Feature engineering is where domain expertise becomes model accuracy. A generic ML engineer builds features from what's in the data. You build features from what drives the outcome — using your comp plan knowledge to construct variables that have causal meaning, not just correlation.

Feature categories and their analytical purpose

marginal_incentive_rate

Engineered — plan mechanics

TI × dCurve/dAttain at current attainment. The $ earned per additional 1% of attainment. The behavioral lever — not TI itself.

High R² impact

distance_to_kink

Engineered — behavioral

Quota% needed to reach next accelerator step. Near-zero = strong behavioral pull. Used to detect tier-chasing.

High behavioral signal

quota_difficulty_index

Engineered — fairness

Quota / TAM × (1 / growth_rate). Adjusts quota for territory opportunity. Controls for structural difficulty in regression.

Controls confounding

trailing_attainment_3q

Lagged — performance history

Rolling 3-quarter average attainment. Controls for rep ability — separates plan response from rep quality.

Reduces omitted variable bias

plan_change_exposure

Treatment indicator — DID

Binary: did this rep's plan change in this period? Used as treatment variable in difference-in-differences for elasticity estimation.

Causal identification

q4_pipeline_pull_fwd

Engineered — sandbagging

Ratio of Q4 close rate to Q1–Q3 close rate for same rep. Values >1.5 indicate pipeline timing manipulation.

Behavioral risk flag

comp_ratio

External — market competitiveness

Rep OTE / market median OTE (Radford). Below 0.9 = flight risk regardless of plan design. Controls for attrition in payout model.

Attrition control

manager_attainment_avg

Hierarchical — manager effect

Rolling avg attainment of all reps under same manager. Captures coaching effect — controls for manager quality confounding.

Reduces clustering bias

Feature engineering code — Python + PySpark on Fabric/Databricks

import pandas as pd
import numpy as np
from pyspark.sql import functions as F

# Load analytical base table
df = spark.table("comp_analytics.rep_base_table")

# 1. Marginal incentive rate — the behavioral lever
# dPayout/dAttain at current attainment
def marginal_incentive(attain, ti, accel, floor, cap):
    if attain < floor:
        return 0
    elif attain <= 1.0:
        return ti / (1 - floor)          # slope of linear zone
    elif attain <= cap:
        return ti * accel                # accelerator zone slope
    else:
        return 0                         # above cap — no marginal incentive

df = df.withColumn("marginal_incentive_rate",
    F.udf(marginal_incentive)(
        F.col("attainment_pct"),
        F.col("ti_amount"),
        F.col("accel_rate"),
        F.col("threshold_pct"),
        F.col("cap_multiplier")
    ))

# 2. Distance to next kink point (accelerator step)
# How far is the rep from the next behavioral inflection?
df = df.withColumn("distance_to_kink",
    F.when(F.col("attainment_pct") < 1.0,
           F.lit(1.0) - F.col("attainment_pct"))          # distance to quota
     .when(F.col("attainment_pct") < 1.2,
           F.lit(1.2) - F.col("attainment_pct"))          # distance to step 2
     .when(F.col("attainment_pct") < 1.5,
           F.lit(1.5) - F.col("attainment_pct"))          # distance to step 3
     .otherwise(F.lit(0)))

# 3. Quota difficulty index — controls for structural territory hardness
df = df.withColumn("quota_difficulty_index",
    F.col("quota_amount") /
    (F.col("tam_usd") * F.col("market_growth_rate") + F.lit(1e-6)))

# 4. Trailing 3-quarter attainment — controls for rep ability
window = Window.partitionBy("rep_id").orderBy("fiscal_quarter_int")
               .rowsBetween(-3, -1)
df = df.withColumn("trailing_attainment_3q",
    F.avg("attainment_pct").over(window))

# 5. Sandbagging signal — Q-end deal concentration
df = df.withColumn("sandbagging_flag",
    (F.col("q_end_concentration") > 1.5).cast("int"))

# 6. Interaction term: marginal rate × distance to kink
# Captures: high incentive + close to step = maximum behavioral pull
df = df.withColumn("incentive_proximity_interaction",
    F.col("marginal_incentive_rate") * (1 / (F.col("distance_to_kink") + 0.01)))

The insight that impresses interviewers

"The most important feature is not TI — it's marginal_incentive_rate. TI is the total pot. Marginal rate is what the rep earns for the next dollar of effort at their current position on the curve. A rep at 98% of quota has a very different marginal rate than a rep at 102% — even if their TI is identical. Using TI as the behavioral predictor instead of marginal rate is a common modeling error that I specifically correct for."

Phase 4 — Model selection and justification ~4 min

Model selection is where most interviews are won or lost. Don't say "I'd use regression." Say which regression, why, what alternatives you considered, and what assumptions you're testing. Then show you know when to escalate to ensemble methods.

Model selection by sub-problem

Sub-problem 1: Cost prediction → OLS multivariate regression (primary)

Why OLS first: Interpretability is critical — every coefficient must be explainable to a finance audience. OLS gives you β coefficients with standard errors and p-values. You can say "a 10% increase in TI increases expected payout by $X, holding attainment constant."

Model form:

Payout_i = β₀ + β₁(attainment_i) + β₂(TI_i) + β₃(accel_rate_i) + β₄(attainment_i²) + β₅(attainment_i × accel_rate_i) + β₆(tenure_i) + β₇(quota_difficulty_i) + ε_i

The attainment² term: Captures the nonlinearity of the accelerator zone. Without it, OLS underestimates payout for high attainers.

The interaction term (attainment × accel_rate): Tests whether the accelerator effect is larger at higher attainment — which it should be. If the coefficient is significant and positive, the accelerator is working as intended.

Sub-problem 2: Behavioral prediction → Difference-in-Differences (causal) + GBM (predictive)

Difference-in-Differences (DiD): The gold standard for causal behavioral estimation when a natural experiment exists. Treatment group = reps whose plan changed. Control group = reps whose plan didn't change in the same period. Identifies the causal effect of the plan change on attainment, controlling for time trends.

Attainment_it = α + β₁(Treated_i) + β₂(Post_t) + β₃(Treated_i × Post_t) + γX_it + ε_it β₃ is the DiD estimator — the causal effect of the plan change. This is the behavioral elasticity.

GBM (Gradient Boosted Machine) for segment-level prediction: When DiD isn't available (new plan with no prior experiment), use GBM to predict attainment from features. GBM captures nonlinear relationships and interactions automatically. XGBoost or LightGBM. Use SHAP values to explain feature importance to business stakeholders.

Sub-problem 3: Quota fairness → Fixed-effects panel regression

Why fixed effects: Controls for time-invariant rep characteristics (ability, territory quality) that we can't fully observe. Rep fixed effects absorb everything constant about a rep — isolating the variation we care about (plan mechanics, market conditions).

Attainment_it = α_i + β₁(QuotaDifficulty_it) + β₂(TAM_growth_it) + β₃(MarginalRate_it) + ε_it α_i = rep fixed effect. If QuotaDifficulty_it has a large negative coefficient after controlling for fixed effects, quota is structurally harder for certain territory types — a fairness finding.

Model selection decision framework

Interpretability required → OLS / logistic / panel regression

When output goes to finance, HR, or legal — stakeholders who need to understand every coefficient — always start with interpretable models. Black-box accuracy is not worth the governance risk.

Causal question → DiD, RDD, IV regression

Whenever the question is "what caused this" — use causal inference methods. OLS on observational data answers correlation, not causation. For behavioral elasticity, causal identification is non-negotiable.

Predictive accuracy required → GBM / XGBoost + SHAP

When the question is "what will happen" rather than "why did it happen" — use gradient boosting. Add SHAP values to regain interpretability. Never deploy a black-box model to a finance audience without SHAP explanations.

Phase 5 — Model training, validation, and R² improvement ~4 min

Model training is not "fit and done." It's an iterative process of diagnosing fit, identifying omitted variables, and adding features that improve R² with theoretical justification. Show the hiring manager you know why R² moves, not just that it does.

Training pipeline — step by step

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.model_selection import KFold, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import statsmodels.api as sm
import statsmodels.formula.api as smf

# ── 1. TRAIN / VALIDATION / TEST SPLIT ──────────────────────────────
# Temporal split — NEVER random split on time-series data
# Train: FY2021–FY2022 | Val: FY2023 Q1-Q2 | Test: FY2023 Q3-Q4
train = df[df['fiscal_year'] <= 2022]
val   = df[(df['fiscal_year'] == 2023) & (df['fiscal_quarter'].isin(['Q1','Q2']))]
test  = df[(df['fiscal_year'] == 2023) & (df['fiscal_quarter'].isin(['Q3','Q4']))]

# ── 2. BASELINE OLS (Iteration 1) ────────────────────────────────────
# Start simple — establish baseline R²
model_v1 = smf.ols(
  'payout ~ attainment_pct + ti_amount + accel_rate',
  data=train
).fit()
print(f"Baseline R²: {model_v1.rsquared:.3f}")   # expect ~0.55–0.65

# ── 3. ADD NONLINEAR TERM (Iteration 2) ──────────────────────────────
# attainment² captures accelerator zone nonlinearity
train['attainment_sq'] = train['attainment_pct'] ** 2
model_v2 = smf.ols(
  'payout ~ attainment_pct + attainment_sq + ti_amount + accel_rate',
  data=train
).fit()
print(f"R² with quadratic term: {model_v2.rsquared:.3f}")   # expect +0.05–0.08

# ── 4. ADD INTERACTION TERM (Iteration 3) ────────────────────────────
# attainment × accel_rate: does the accelerator effect grow with attainment?
model_v3 = smf.ols(
  'payout ~ attainment_pct + attainment_sq + ti_amount + accel_rate '
           '+ attainment_pct:accel_rate',
  data=train
).fit()
print(f"R² with interaction: {model_v3.rsquared:.3f}")      # expect +0.02–0.04

# ── 5. ADD BEHAVIORAL AND CONTEXT FEATURES (Iteration 4) ─────────────
model_v4 = smf.ols(
  'payout ~ attainment_pct + attainment_sq + ti_amount + accel_rate '
           '+ attainment_pct:accel_rate '
           '+ trailing_attainment_3q + quota_difficulty_index '
           '+ tenure_months + marginal_incentive_rate',
  data=train
).fit()
print(f"R² with full features: {model_v4.rsquared:.3f}")    # expect ~0.78–0.85

# ── 6. SIGNIFICANCE TESTING ──────────────────────────────────────────
# Keep features only if p < 0.05 AND theoretically justified
print(model_v4.summary())
# Drop features with p > 0.10 unless theoretically critical
# Check VIF for multicollinearity
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif = pd.DataFrame({
    'feature': train[features].columns,
    'VIF': [variance_inflation_factor(train[features].values, i)
            for i in range(len(features))]
})
# Flag VIF > 10 as multicollinearity risk

# ── 7. CROSS-VALIDATION ──────────────────────────────────────────────
# 5-fold CV on training data — ensure R² is not overfit
pipeline = Pipeline([('scaler', StandardScaler()),
                     ('model', LinearRegression())])
cv_scores = cross_val_score(pipeline, train[features], train['payout'],
                             cv=KFold(n_splits=5, shuffle=False),
                             scoring='r2')
print(f"CV R² mean: {cv_scores.mean():.3f} ± {cv_scores.std():.3f}")

# ── 8. RESIDUAL DIAGNOSTICS ──────────────────────────────────────────
# Check: normality, homoscedasticity, no autocorrelation
residuals = model_v4.resid
# Durbin-Watson test for autocorrelation (want ~2.0)
from statsmodels.stats.stattools import durbin_watson
dw = durbin_watson(residuals)
# Breusch-Pagan for heteroscedasticity
from statsmodels.stats.diagnostic import het_breuschpagan
bp_test = het_breuschpagan(residuals, model_v4.model.exog)

The R² improvement narrative — what to say

Baseline R² ~0.60 — attainment + TI + accel only

Base model explains 60% of payout variance. The remaining 40% is unexplained — driven by nonlinearity, interactions, and omitted variables. This is your starting point, not your answer.

+attainment² → R² ~0.67 (+7%)

The quadratic term captures the accelerator zone nonlinearity. Without it, the model systematically underpredicts payout for high attainers — the most financially important segment. This is not a data science trick — it's a direct encoding of the plan mechanics.

+attainment × accel interaction → R² ~0.70 (+3%)

Tests whether the accelerator effect is larger at higher attainment. If coefficient is positive and significant, it validates that the accelerator design is working as intended — reps in the accelerator zone respond more strongly to marginal rate changes.

+trailing attainment, quota difficulty, tenure → R² ~0.82 (+12%)

These control for rep ability and territory structural advantage — the two largest omitted variable biases in the baseline model. Without them, plan effects are confounded with rep quality and market opportunity. Adding them isolates the true plan mechanics effect.

Phase 6 — Simulation layer: scenario modeling, Monte Carlo, sensitivity analysis ~4 min

The regression model gives you a point estimate. The simulation layer converts that point estimate into a probability distribution — which is what finance actually needs. The three simulation tools serve three different questions.

How simulation connects to the regression model

Step 1 — Use regression to parameterize the attainment distribution

The OLS model gives you: predicted attainment mean (ŷ) and residual standard deviation (σ_ε) for each rep segment. These become the distribution parameters for the Monte Carlo draws. Monte Carlo is not guessing — it's sampling from a calibrated, model-derived distribution.

attain_i ~ Normal(ŷ_i, σ_ε_i) ← model-derived, not assumed

payout_i = Base + TI × curve(attain_i, new_plan_params)

Step 2 — Scenario modeling for leadership

Three scenarios anchored at model-predicted percentiles of the attainment distribution: P25 (downside), P50 (base), P75 (upside). Each scenario runs the payout formula and produces total cost and % at quota. Fast, readable, executive-ready.

Step 3 — Monte Carlo for finance

1,000–5,000 simulation runs. Each run draws every rep's attainment from the model-calibrated distribution, applies the new plan formula, records total cost. Output: full cost distribution with P10/P50/P90. P50 = budget anchor, P90 − P50 = reserve recommendation. Probability of exceeding budget = finance's risk metric.

Step 4 — Sensitivity analysis for governance prioritization

Re-run Monte Carlo with each plan parameter perturbed ±10%. Record P90 cost change for each perturbation. Parameters with highest ΔP90 = highest governance priority. Build a tornado chart. This directly answers: "which levers need a CFO sign-off vs. VP sign-off?"

Monte Carlo implementation

import numpy as np
from scipy import stats

def run_monte_carlo(rep_df, new_plan_params, n_sims=2000, seed=42):
    """
    rep_df: DataFrame with predicted_attain_mean and residual_std per rep
    new_plan_params: dict with ti, accel_rate, threshold, cap, base
    """
    np.random.seed(seed)
    N = len(rep_df)
    total_costs = np.zeros(n_sims)

    for sim in range(n_sims):
        # Draw attainment for every rep from model-calibrated Normal
        attainments = np.random.normal(
            loc=rep_df['predicted_attain_mean'].values,
            scale=rep_df['residual_std'].values
        ).clip(min=0)

        # Apply new payout formula to each rep
        payouts = vectorized_payout(
            attainments,
            ti=new_plan_params['ti'],
            accel=new_plan_params['accel_rate'],
            floor=new_plan_params['threshold'],
            cap=new_plan_params['cap'],
            base=new_plan_params['base_salary']
        )
        total_costs[sim] = payouts.sum()

    return {
        'p10': np.percentile(total_costs, 10),
        'p50': np.percentile(total_costs, 50),
        'p90': np.percentile(total_costs, 90),
        'prob_over_budget': (total_costs > budget).mean(),
        'distribution': total_costs
    }

def sensitivity_analysis(rep_df, base_params, param_perturbations, budget):
    """
    Runs Monte Carlo for each parameter perturbation.
    Returns tornado chart data: which parameter moves P90 most.
    """
    results = {}
    base = run_monte_carlo(rep_df, base_params)
    base_p90 = base['p90']

    for param, delta in param_perturbations.items():
        perturbed = base_params.copy()
        perturbed[param] += delta
        result = run_monte_carlo(rep_df, perturbed)
        results[param] = result['p90'] - base_p90

    return pd.Series(results).sort_values(ascending=False)

Phase 7 — Predicting behavioral changes in the rep population ~4 min

This is the section that separates a data scientist from a comp analytics expert. Behavioral prediction requires not just model output — it requires comp domain knowledge to interpret what reps will actually do and flag unintended consequences before they happen.

Five behavioral changes to predict and flag

Behavior 1 — Accelerator chasing (intended)

What happens: Reps near the accelerator kink point push harder to cross it. How to detect: In the DiD model, reps with distance_to_kink < 5% show significantly higher Q4 activity after accelerator rate increase. Analytics deliverable: Flag the % of rep population within 5% of a kink point — this is the "responsive segment" whose behavior will change.

Behavior 2 — Sandbagging / pipeline timing (unintended)

What happens: Reps who hit the cap early bank pipeline for Q1. Reps near a tier boundary hold deals to ensure they hit it clean. How to detect: q_end_concentration feature — spike after plan change indicates sandbagging. Run autocorrelation analysis on rep-level quarterly revenue: if Q4 is systematically high and Q1 low, pipeline timing is occurring. Analytics deliverable: Flag this in the behavioral prediction report. Recommend a quarterly minimum attainment provision in the plan.

Behavior 3 — Product mix gaming (unintended)

What happens: If the product mix overlay rewards Azure revenue ≥ 30%, reps will attach low-value Azure SKUs to deals just to hit the threshold — without genuine Azure expansion. How to detect: Spike in Azure deal count with declining avg Azure deal size after overlay introduction. Run t-test on pre/post avg Azure deal size for the affected rep segment. Analytics deliverable: Recommend a minimum Azure revenue floor (not just percentage) to prevent SKU stuffing.

Behavior 4 — Attrition of bottom performers (intended but costly)

What happens: Raising the threshold from 0% to 50% removes the safety net for struggling reps — some will leave. How to predict: Logistic regression: P(voluntary_exit) ~ comp_ratio + trailing_attainment_3q + new_threshold_exposure + tenure. Reps with comp_ratio < 0.9 AND trailing attainment < 60% are high flight risk under a threshold increase. Analytics deliverable: Estimate expected attrition increase and backfill cost — factor into the ROI model.

Behavior 5 — Top performer saturation at cap (unintended)

What happens: Reps who hit the cap in Q2 have zero marginal incentive for Q3-Q4. They coast — or start their job search. How to detect: For reps who hit cap in prior years, analyze Q3-Q4 activity levels vs. pre-cap behavior. Analytics deliverable: Model the revenue foregone from cap saturation. Recommend either raising the cap, introducing a "stretch" accelerator tier, or adding a Q3-Q4 activity MBO to maintain engagement.

Say this in the interview

"The behavioral prediction section is where I go beyond modeling what the plan costs and into modeling what it does to rep behavior — including the unintended consequences. I flag five behavioral responses: accelerator chasing (intended, quantify it), sandbagging (unintended, detect with pipeline timing analysis), product mix gaming (unintended, detect with avg deal size test), bottom performer attrition (intended but has backfill cost), and top performer cap saturation (unintended revenue loss). Each of those flags comes with a specific analytical test and a plan design recommendation. That's the comp analytics function working at its highest value — not just modeling cost, but shaping design."

Phase 8 — Legal risk identification and remediation ~4 min

This section is your differentiator. Most analytics candidates skip legal entirely. Walking through it signals that you understand the governance landscape and can operate at the intersection of data, business, and compliance — which is exactly what a senior role at Microsoft requires.

The four legal risk categories and how analytics addresses each

1. Disparate impact — EEOC / OFCCP

Risk: Plan produces systematically lower payouts for a protected class (gender, race, age) after controlling for performance and role.

Analytics fix: Run multivariate regression with payout as Y, add protected class indicators after controlling for role/level/tenure/attainment. If coefficient on gender is statistically significant and negative → disparate impact. Flag before launch, not after.

Payout = β₀ + β₁(Attain) + β₂(Level) + β₃(Tenure) + β₄(Female) + ε

β₄ should be non-significant. If p < 0.05 → legal risk. If β₄ < 0 → women are systematically underpaid after controls.

2. Quota inequity — embedded in design

Risk: If territory assignments correlate with rep demographics, inequitable quotas create disparate comp outcomes even with a "fair" plan formula.

Analytics fix: Run quota_difficulty_index by demographic cohort. If female reps systematically have higher difficulty index scores in the same role/segment → the quota-setting methodology must be audited. This is a legal risk embedded two layers up from the payout formula.

3. Wage and hour compliance — FLSA / state law

Risk: Clawback provisions, draw against commission, and minimum wage requirements vary by state. California (Labor Code §221) restricts clawbacks. Washington has specific commission agreement requirements.

Analytics fix: Build a geo-flag in the model — any rep in a state with restrictive clawback law gets flagged for legal review before the plan document is finalized. Run scenario analysis on clawback recovery rates by state to model expected net cost.

4. Channel conflict / anti-kickback

Risk: Product mix overlays that push reps to steer customers toward specific products can trigger anti-kickback concerns in healthcare, government, or regulated industries.

Analytics fix: Flag any plan with a product-specific overlay for legal review when the customer segment includes healthcare or government entities. Model the overlay payout distribution — if a small % of reps are earning large overlay payouts from a single customer segment, flag the concentration risk.

Say this in the interview

"Legal risk analysis is a mandatory step in my model output — not an afterthought. I run four tests before any plan goes to the governance committee: disparate impact regression (are protected classes systematically underpaid after controls?), quota equity analysis (are territory assignments creating structural inequity by demographic cohort?), geo-flag for wage and hour compliance (which states have clawback or minimum earnings requirements?), and channel conflict screening (do any product mix overlays create anti-kickback exposure in regulated customer segments?). If any test fires, the model output includes a remediation recommendation before I present to legal."

Phase 9 — ROI output, stakeholder communication, and governance ~4 min

The model is only as valuable as the decision it enables. This phase is about converting analytical output into a governance-ready package that gives every stakeholder exactly what they need — and nothing they don't.

The complete output package — by stakeholder

For sales leadership — 1 page, scenario table

Three scenarios (downside/base/upside), four metrics (total cost, % at quota, median payout, cost % of revenue). One chart — cost range bar. One sentence recommendation. No formulas, no p-values. Behavioral change flags in plain language: "We expect 15% of reps to push harder near the 120% tier boundary; expect a slight Q4 pipeline concentration effect."

For finance — 2 pages, Monte Carlo + cost decomposition

P50 and P90 cost distribution. Reserve recommendation = P90 − P50. Cost decomposition waterfall: TI change vs. accelerator change vs. modifier interaction. ROI ratio and break-even attainment lift. Cost of sales % under old and new plan vs. industry benchmark. Elasticity assumption with source and confidence bounds.

For HR and legal — 2 pages, compliance analysis

Disparate impact regression results: coefficient on each protected class indicator, p-value, interpretation. Quota equity distribution by demographic cohort. State-level compliance flag table: clawback enforceability, draw recovery rules, minimum earnings requirements by rep location. Channel conflict screening results.

For the SIC governance committee — full package + recommendation

All of the above, plus: sensitivity analysis tornado chart (which parameters need CFO vs. VP sign-off), materiality determination (total ΔCost at P50 → which approval tier applies), implementation timeline and system build feasibility confirmation from the SAP Commissions team, and a clear go/no-go recommendation with conditions.

The capstone answer — say this to close the 30-minute walkthrough

The closing paragraph — memorize this

"Let me summarize the end-to-end. I start by decomposing the business question into three sub-problems: cost prediction, behavioral prediction, and ROI estimation — each requiring a different model family. I build the analytical base table in SQL at the rep-quarter grain, joining comp system data, CRM behavioral data, HR attributes, and market data. I engineer features that encode the causal mechanism — marginal incentive rate, distance to kink, quota difficulty index, sandbagging signal — not just raw variables. I train an OLS multivariate regression for cost prediction with quadratic and interaction terms to capture plan nonlinearity, a difference-in-differences model for causal behavioral elasticity estimation where a natural experiment exists, and a gradient boosted model for segment-level predictive attainment. I validate with temporal cross-validation, residual diagnostics, and VIF checks. I run scenario modeling for leadership, Monte Carlo for finance budget setting and reserve recommendations, and sensitivity analysis to prioritize the governance process. I predict five behavioral responses — including unintended ones like sandbagging and cap saturation — and make design recommendations for each. I run four legal pre-flight tests before any plan reaches the governance committee. And I package the output by stakeholder: scenario table for sales leadership, Monte Carlo and ROI for finance, compliance analysis for HR and legal, and the full package with a recommendation for the governance committee. That is how I develop a comp model. The whole pipeline takes three to four weeks in practice — and it's the analytical foundation that makes every plan change defensible, financially sound, and legally safe."

The sentence that gets you hired

After this walkthrough, add: "The DirecTV satellite outage model I built followed exactly this structure — business question decomposition, SQL base table across 50+ DMAs, feature engineering on outage severity and DMA-level penetration, multivariate regression with interaction terms, scenario and Monte Carlo simulation producing a $4M revenue-at-risk range, and a governance-ready output for leadership. The domain was different. The analytical discipline was identical." That bridge from your real work to the methodology is the evidence that everything you just described is not theoretical — you've done it.

End-to-end comp model walkthrough — 30-minute interview answer covering all modeling, ML, legal, and business domains