I turn messy, multi-source data into clear models, risk ranges, and recommendations you can act on.
Led by Joseph Martin, Founder & Lead Consultant.
Case studies with quantified results.
Reproducible workflows (R/Python).
Advanced statistical training.
Free · no prep needed.
How I Work
Intro call (20 min)
Discovery Sprint (D0)
Project delivery (D1+)
Handoff
Discovery Sprint (D0) — 1–2 weeks, fixed-fee.
Feasibility, early findings, and a concrete plan/quote for delivery. Learn more on Services
After You Book
Short intake form.
20-min call to clarify goals & data reality.
If there’s a fit, I’ll propose a fixed-fee Discovery Sprint (D0).
Data is de-identified by default. Nothing shared without permission. Read Full Policy
Selected work across pay equity, pricing/ROI, clinical risk, geospatial access, and advanced methods. Where client disclosure isn’t possible, I include privacy-preserving examples, anonymized work or research demos with public/simulated data—built with the same rigor end-to-end.Don’t see what you need? Let's talk →
At a glance: $14,280 average pay gap; adjusted model shows persistent gap by role & tenure.
Outcome: Focused adjustments where impact is highest, plus reproducible checks.
View case study →
Equity · Randomization test + OLS (HC3) · D0: 1–2 weeks
At a glance: Recommended $700k (vs $500k) for lower volatility and consistently positive ROI.
Outcome: Clear year-one budget choice with risk/return ranges.
View case study →
Pricing/ROI · OLS + Prediction Intervals · D0: 1–2 weeks
At a glance: Interpretable 3-feature model (engine size, horsepower, weight) with price estimates.
Outcome: Pricing guidance for unlisted vehicles and sales strategy.
View case study →
Pricing/ROI · Multiple Regression · D0: 1–2 weeks
At a glance: ~50% of target population within 30-min walk; ~30% county reach by 30-min bus.
Outcome: Recommended site adjacent to intercity transit for equitable access.
View case study →
Geospatial · K-means + GIS/isochrones · D0: 1–2 weeks
At a glance: BMI up to ~17× odds of leak (logistic model).
Outcome: Risk stratification to support pre-op planning and counseling.
View case study →
Healthcare · Logistic Regression · D0: 1–2 weeks
At a glance: +12 percentage points strike calls at 3-ball counts; –24 percentage points at two strikes (relative to ~47% baseline).
Outcome: Quantified context bias; framework generalizes to other human-in-the-loop decisions.
View case study →
Behavioral · Logistic Regression (marginal effects) · D0: 1–2 weeks
At a glance: Mean error ~0.5 m; interval estimates from 0.1–8 km.
Outcome: Uncertainty-aware guardrails for real-time tuning.
View case study→
Advanced Methods · Kernel Regression + Bootstrap PIs · D0: 1–2 weeks
At a glance: ~95% re-sim agreement with target energy metrics; no brute-force search.
Outcome: Faster parameter inference to guide experimental design.
View case study →
Advanced Methods · Invertible Neural Network · D0: 1–2 weeks
Hi, I’m Joseph Martin. I help organizations make confident decisions from imperfect data. My work spans pay-equity audits, pricing and ROI models, clinical risk prediction, geospatial access planning, and inverse-problem modeling for research.
Clarity over complexity: plain-language summaries with quantified uncertainty.
Reproducible by default: versioned notebooks, documented assumptions.
Right-sized solutions: models that match the decision, not the syllabus.
Data ethics & privacy: de-identification first, least-access needed.
1. Intro call (20 min)
Goals, stakes, data reality.
No prep needed; quick fit check.
2. Discovery Sprint (D0)
1–2 weeks, fixed‑fee: data intake & feasibility, early findings, and approach.
Decision brief + plan/quote for D1+.
3. Project delivery (D1+)
Analysis/modeling with review checkpoints; clear acceptance criteria.
Decision-ready briefs; clear options with trade-offs.
4. Handoff
Reproducible code/notebooks, visuals, and a “what to monitor next” note.
Optional support window.
Discovery Sprint (D0) details on the Services page. →
NSF-funded research collaboration in statistical modeling.
Applied data science to nonprofit resource planning.
Presented applied work at academic conferences.
Advanced training in Mathematics and Statistics.
Tooling
R, Python, tidyverse, scikit-learn, Quarto/Markdown, Git, SQL.
Data handling approach and retention are outlined here: Data & Privacy
I offer practical data support for teams that need clarity—not complexity. Whether you’re planning spend, validating fairness, forecasting outcomes, or siting resources, I help translate messy data into decisions.
How I Work (at a glance)
Intro call (20 min)
Discovery Sprint (D0)
Project delivery (D1+)
Handoff
Discovery Sprint (D0) — 1–2 weeks, fixed-fee.
Feasibility, early findings, decision brief, and a concrete plan/quote for delivery.
Decision Analytics & Modeling
Good for: “What’s driving results?” “What if we change X?”
Deliverables: drivers & elasticities, scenario ranges, decision brief, notebook.
Pay Equity & Risk Audits
Good for: internal compensation reviews, regulatory readiness, internal fairness.
Deliverables: methodology memo, effect sizes & uncertainty, remediation options.
Pricing, Forecasting & ROI
Good for: ad-spend planning, pricing bands, budgeting.
Deliverables: forecast ranges, sensitivity tables, “guardrails” for decisions.
Geospatial & Access Planning
Good for: site selection, resource placement, equity analysis.
Deliverables: maps, cluster logic, accessibility metrics, recommended site(s).
Technical Collaboration (Research & Prototyping)
Good for: Inverse problems, experimental design, method validation.
Deliverables: reproducible code, experiment plan, evaluation report.
How I Work — full process
1. Intro call (20 min)
Goals, stakes, data reality.
No prep needed; quick fit check.
2. Discovery Sprint (D0)
1–2 weeks, fixed‑fee: data intake & feasibility, early findings, and approach.
Decision brief + plan/quote for D1+.
3. Project delivery (D1+)
Analysis/modeling with review checkpoints; clear acceptance criteria.
Plain‑language summaries with uncertainty ranges.
4. Handoff
Reproducible code/notebooks, visuals, and a “what to monitor next” note.
Optional support window.
See Data & Privacy for access, retention, and NDA details.
Have a question or a dataset you’re wrestling with? I’m happy to take a look.
Free · no prep needed.
After You Book
Short intake form.
20-min call to clarify goals & data reality.
If there’s a fit, I’ll propose a fixed-fee Discovery Sprint (D0).
Data & NDA
Data is de-identified by default. Nothing shared without permission. Read Full Policy
This page summarizes how I handle data access, security, and retention during and after a project. I’m happy to align to reasonable client policies and to sign your NDA.
De‑identification first: remove direct identifiers where feasible; prefer aggregates and sampling for D0.
Minimum‑necessary access: start read‑only; least‑privilege principles.
Secure transfer: SFTP/SharePoint/Drive links provided; avoid email attachments with sensitive data.
Reproducibility: version‑controlled code/notebooks; assumptions documented.
Confidentiality: nothing shared outside your org without written permission.
Access is scoped to project needs and revoked at close.
Credentials are stored in approved password managers only (no plaintext).
Multi‑factor authentication used where supported.
Happy to review and execute an NDA prior to receiving materials.
Statements of Work document acceptance criteria and deliverables.
Unless otherwise required, project data is deleted or returned within 30 days of project close, unless otherwise agreed in writing.
Logs and derived, non‑identifying artifacts (e.g., code templates) may be retained.
I do not operate as a covered entity. HIPAA/PHI handling is only supported with appropriate agreements and client‑approved secure environments.
Questions about data handling or NDAs? Contact me · Book a 20-min intro call
TIME TO VALUE: 1–2 weeks
Outcome
Evidence that borderline strike calls are systematically shifted by count state: more balls → higher strike likelihood; more strikes → lower strike likelihood. These effects are large enough to change at-bat outcomes in aggregate and are presented with clear, interpretable deltas from the baseline ~47% strike rate in the judgment zone.

Figure 1 — Strike probability by ball count, stratified by strike count
At two strikes (blue line), umpires are significantly less likely to call a strike, while chances steadily increase with number of balls (horizontal axis).
Decision question
Do umpires adjust borderline strike calls based on the current count (balls/strikes), and by how much, relative to a neutral baseline?
Approach
Fit a logistic regression on judgment-zone pitches (just in/out of the rulebook zone).
Predict strike-call probabilities across all ball × strike states; report changes vs. baseline.
Emphasize count effects (balls, strikes) while holding location at “borderline” and controlling for nuisance factors where available.
Technical note: Logit link; coefficients summarized as percentage-point deltas from the judgment-zone baseline. Robust SEs; marginal effects computed for each count. Results shown as partial-dependence curves and bar-chart deltas.
Findings
Ball count effect:
Each additional ball increases strike probability by ≈ +2.5 percentage points (typically +1 to +4).
At 3 balls, probability can rise by ≈ +12 percentage points vs baseline.
Strike count effect:
Each additional strike decreases strike probability by ≈ −11 percentage points (typically −12 to −9).
At 2 strikes, the reduction can reach ≈ −24 percentage points.

Figure 2 — Marginal effects: +1 ball vs +1 strike
One additional ball increases the strike‐call probability by +2.5 percentage points, whereas one additional strike decreases the strike‐call probability by −11.0 percentage points (baseline: 0 balls, 0 strikes).
Interpretation: Patterns are consistent with contextual bias—a tendency to “restore balance”: more lenient toward pitchers in high-ball counts and toward batters in high-strike counts.
If this were your org (D0 Discovery Sprint — 1–2 weeks, fixed-fee)
Purpose: De-risk scope, surface blockers, and deliver first answers—so we can make a concrete plan and quote for D1+ (the deliverables phase that follows Discovery).Length: 1–2 weeks (calendar), ~12–30 focused hours.Inputs: 30–60 min kickoff, read-only sample data, business goal(s), and must-have decisions/dates.For umpire decision modeling, that might include pitch-level data with location (rulebook zone flag, x/y coordinates), call outcome (ball/strike), count state (balls/strikes), pitcher/batter IDs, and game context (inning, score, home/away).
You get (D0 outputs):
Data Intake & Feasibility Memo — what exists, quality, gaps, risks.
Early Findings — preliminary evidence, visuals, and concise takeaways.
Analytical Approach — what I’d explore next (and alternatives), with assumptions.
Decision Brief — which decisions can be addressed with current data, and what additional inputs are needed.
D1+ Plan & Fixed Quote — scope, milestones, acceptance criteria, cost.
Pricing: Discovery Sprints are fixed fee. D1+ Delivery is scoped and quoted after D0.
Scope your D0 plan
What’s next (D1+ Delivery)
With feasibility confirmed, D1+ is the substantive project phase that turns Discovery into concrete outcomes and deliverables.For example:
Production-grade bias model with dashboards (count-state partials, confidence bands, game/ump splits).
Context controls (pitch type, handedness, catcher framing, location density) and fairness decomposition (within-ump vs between-ump).
Scenario analysis (how bias shifts run expectancy; team and league strategy implications).
Monitoring (weekly drift, crew-level reports) and documentation (model card, tests, reproducible notebooks).
Project context
Type: Independent applied research. Role: Lead analyst. Designed as a reproducible analysis. For client work, your data and artifacts remain yours and are never shared externally; no PII is required for this analysis. Access is read-only
Free · no prep needed.
TIME TO VALUE: 1–2 weeks
Outcome
Confirmed a material gender pay gap: raw average difference $14,280. After adjusting for role, level, tenure, and department, a statistically significant gap persisted, concentrated in specific role–tenure bands.
Decision Question
Is there evidence of a gender pay gap in this dataset—or could the observed difference be explained by random variation?
Approach
Randomization test to estimate how often a gap this large appears if pay were truly equal.
Adjusted model controlling for role, level, tenure, department to separate composition from pay effects.
Technical note: permutation resampling for the null; OLS with HC3 SEs; cluster-robust option if needed.
Findings
Observed (raw) gap: males earned $14,280 more on average.
Likelihood under equal pay: ~1.4%, empirical p ≈ 0.014.
Adjusted estimate: statistically significant gap remains after controlling for role, level, tenure, and department.
Concentration: gap is largest in specific role–tenure bands and a small set of departments

Figure 1 — Randomization distribution of pay gaps under equal pay
A histogram of permuted pay differences with the observed gap marked. Only ~1.4% of null resamples exceed the observed gap.
Interpretation
The pattern is unlikely to be random: if pay were truly equal, a gap this large would appear only ~1.4% of the time by chance. The adjusted model indicates a persistent gap after accounting for role and tenure, suggesting targeted remediation is warranted (not across-the-board).
If this were your org (D0 Discovery Sprint — 1–2 weeks, fixed-fee)
Purpose: De-risk scope, surface blockers, and deliver first answers—so we can make a concrete plan and quote for D1+ (the deliverables phase that follows Discovery).Length: 1–2 weeks (calendar), ~12–30 focused hours.Inputs: 30–60 min kickoff, read-only sample data, business goal(s), and must-have decisions/dates.For pay equity, that might include salaries, role/level, tenure, department, location, and FTE status.
You get (D0 outputs):
Data Intake & Feasibility Memo — what exists, quality, gaps, risks.
Early Findings — preliminary evidence, visuals, and concise takeaways.
Analytical Approach — what I’d explore next (and alternatives), with assumptions.
Decision Brief — which decisions can be addressed with current data, and what additional inputs are needed.
D1+ Plan & Fixed Quote — scope, milestones, acceptance criteria, cost.
Pricing: Discovery Sprints are fixed fee. D1+ Delivery is scoped and quoted after D0.
Scope your D0 plan
What’s next (D1+ Delivery)
With feasibility confirmed, D1+ is the substantive project phase that turns Discovery into concrete outcomes and deliverables.For example:
Adjusted pay-equity analysis at scale (full dataset, role/level/location controls).
Scenario modeling for targeted adjustments with cost/impact trade-offs.
Guardrail policies for offers/promotions; reviewer checklists.
Monitoring & dashboards (quarterly/annual pay-equity checks).
Handover & documentation (code/notebooks, reproducible reports, acceptance tests).
Project context
Type: Independent applied research. Role: Lead analyst. Designed as a reproducible analysis. For client work, your data and artifacts remain yours and are never shared externally; no PII is required for this analysis. Access is read-only
Free · no prep needed.
TIME TO VALUE: 1–2 weeks
Outcome
Recommend a $700k first-year ad budget: it meaningfully reduces downside risk while maintaining attractive upside. The $500k option shows higher potential peak ROI but includes a non-trivial chance of a first-year loss.
Decision Question
Which budget level ($500k vs $700k) maximizes expected ROI while keeping the probability and magnitude of loss within acceptable bounds?
Approach
Fit a spend→revenue model to historical data to estimate marginal return and uncertainty.
Generate 95% prediction intervals (PIs) for first-year revenue at $500k and $700k budgets.
Convert revenue PIs to ROI ranges and compare risk/return profiles.
Technical note: OLS with HC3 robust SEs; prediction intervals from the fitted model. Sensitivity checks include leave-one-out fits, an optional log-linear spec (multiplicative effects), and a nonlinearity screen for diminishing returns.
Findings
Marginal return: For each $1 of ad spend, expected revenue increase ≈ $0.97 to $1.20 (95% CI).
$500k budget: Predicted revenue $490k–$870k → ROI −2% to 74%. Contains a below-break-even region.
$700k budget: Predicted revenue $709k–$1.088M → ROI 1.2% to 55.4%. Removes loss in the 95% range and tightens spread.
Interpretation
If you value loss-avoidance and planning certainty in year 1, $700k is the efficient choice. If you are explicitly risk-seeking for higher upside and can tolerate a small chance of loss, $500k is defensible—but should be paired with guardrails (see D1+).

Figure 1 — Spend→Revenue model and budget markers
A line plot fit to historical data with 95% confidence band, showing average ROI across different ad spends. Vertical lines at $500k and $700k.

Figure 2 — Predicted year-1 revenue by advertising budget (95% prediction intervals)
Vertical bars show projected first-year revenue of $680k (for a $500k budget) and $898k (for a $700k budget), with intervals indicating plausible ranges. Dashed lines mark the budget break-even points.
If this were your org (D0 Discovery Sprint — 1–2 weeks, fixed-fee)
Purpose: De-risk scope, surface blockers, and deliver first answers—so we can make a concrete plan and quote for D1+ (the deliverables phase that follows Discovery).Length: 1–2 weeks (calendar), ~12–30 focused hours.Inputs: 30–60 min kickoff, read-only sample data, business goal(s), and must-have decisions/dates.For ad spend ROI, that might include historical revenue & ad spend (monthly/quarterly), channel tags (e.g., search/social/display), campaign metadata, seasonality/event notes, basic site conversions (if available).
You get (D0 outputs):
Data Intake & Feasibility Memo — what exists, quality, gaps, risks.
Early Findings — preliminary evidence, visuals, and concise takeaways.
Analytical Approach — what I’d explore next (and alternatives), with assumptions.
Decision Brief — which decisions can be addressed with current data, and what additional inputs are needed.
D1+ Plan & Fixed Quote — scope, milestones, acceptance criteria, cost.
Pricing: Discovery Sprints are fixed fee. D1+ Delivery is scoped and quoted after D0.
Scope your D0 plan
What’s next (D1+ Delivery)
With feasibility confirmed, D1+ is the substantive project phase that turns Discovery into concrete outcomes and deliverables.For example:
ROAS and expected ROI ranges across budget scenarios.
Diminishing-returns curve (log/logit/Hill or spline) and budget optimizer (expected value vs. probability-of-loss).
Channel mix analysis (incremental ROAS; reallocation scenarios).
Staged-ramp plan with stop-loss thresholds and weekly checkpoints.
Monitoring dashboard (forecast vs actuals; drift & alerting).
Handover & documentation (code/notebooks, reproducible reports, acceptance tests).
Project context
Type: Independent applied research. Role: Lead analyst. Designed as a reproducible analysis. For client work, your data and artifacts remain yours and are never shared externally; no PII is required for this analysis. Access is read-only
Free · no prep needed.
TIME TO VALUE: 1–2 weeks
Outcome
An interpretable pricing model that reliably estimates fair value for unlisted vehicles and clarifies the three drivers that matter most—engine size, horsepower, and vehicle weight—so sales teams can price with confidence and defend decisions.
Decision question
Which vehicle attributes most strongly drive price, and can we produce trustworthy price estimates for specific configurations to guide list price and discount guardrails?
Approach
Develop a transparent multiple regression of (log) price on vehicle features.
Diagnose and address nonlinearity and multicollinearity (VIF-based pruning, transformations).
Reduce to a compact, defensible feature set with strong signal + high interpretability.
Technical note: OLS on log(price); centering/scaling for comparability; HC3 robust SEs; VIF screening; AIC/BIC and out-of-sample checks to guard against overfit.
Findings
Key drivers (standardized effects): horsepower ↑, vehicle weight ↑, engine size ↓. The negative engine-size coefficient likely reflects configuration/efficiency tradeoffs once horsepower and weight are controlled.
Model parsimony: Variables such as hybrid status, wheelbase, highway MPG, and cylinder count did not improve fit materially and were excluded.
Config. estimate (example): non-hybrid, 3.5L, 210 HP, 4,210 lb → $41,000–$46,000 with expect price of $43,000.

Figure 1 — Standardized effect sizes (points with 95% CIs)
Horsepower and vehicle weight have a stronger relative influence on price, than engine size. Engine size showed a negative effect on price, potentially reflecting less efficient or outdated configurations.

Figure 2 — Price estimate for a specific configuration (3.5L, 210 horsepower, 4,210-pound vehicle)
The predicted sale price range is $41,000 to $46,000, with an expected sale price of $41,000.
If this were your org (D0 Discovery Sprint — 1–2 weeks, fixed-fee)
Purpose: De-risk scope, surface blockers, and deliver first answers—so we can make a concrete plan and quote for D1+ (the deliverables phase that follows Discovery).Length: 1–2 weeks (calendar), ~12–30 focused hours.Inputs: 30–60 min kickoff, read-only sample data, business goal(s), and must-have decisions/dates.For vehicle pricing, that might include historical sales transactions, list vs. sold price, vehicle attributes (engine size, horsepower, weight, trim/features), and market context (year, mileage, region).
You get (D0 outputs):
Data Intake & Feasibility Memo — what exists, quality, gaps, risks.
Early Findings — preliminary evidence, visuals, and concise takeaways.
Analytical Approach — what I’d explore next (and alternatives), with assumptions.
Decision Brief — which decisions can be addressed with current data, and what additional inputs are needed.
D1+ Plan & Fixed Quote — scope, milestones, acceptance criteria, cost.
Pricing: Discovery Sprints are fixed fee. D1+ Delivery is scoped and quoted after D0.
Scope your D0 plan
What’s next (D1+ Delivery)
With feasibility confirmed, D1+ is the substantive project phase that turns Discovery into concrete outcomes and deliverables.For example:
Production-grade pricing model (parsimonious, audited, reproducible) with confidence/PI bands surfaced to users.
Price recommendation tool for unlisted vehicles (config inputs → fair value + explainers).
Guardrails & policy: discount thresholds by segment/trim; exception workflow.
Calibration & validation: predicted vs actuals; drift monitoring; periodic re-fit schedule.
Documentation & handover: code/notebooks, tests, model card, playbook for updates.
Project context
Type: Independent applied research. Role: Lead analyst. Designed as a reproducible analysis. For client work, your data and artifacts remain yours and are never shared externally; no PII is required for this analysis. Access is read-only
Free · no prep needed.
TIME TO VALUE: 1–2 weeks
Outcome
An interpretable, publication-ready risk model for anastomotic leak after colectomy that surfaces BMI category as a dominant driver while retaining clinically important covariates. The model supports surgical planning (optimization, counseling, monitoring level) and can be shipped as a bedside risk calculator with defensible odds ratios and clear uncertainty.
Decision question
Can we provide reliable patient-level risk estimates for anastomotic leak—prior to surgery—that clinicians can trust and explain, with BMI handled using standard clinical bands?
Approach
Fit a transparent logistic regression to a cohort of n = 180 colectomy patients.
Predictors: BMI (clinical categories), age, serum albumin, operative duration, tobacco use, gender, and comorbidities.
Categorical BMI bands (Underweight → Obese III) to align with clinical publishing norms and bedside interpretability.
Report odds ratios (ORs) with 95% CIs; profile-likelihood intervals where separation risk appears.
Two worked patient profiles illustrate risk shifts across BMI categories (healthy vs high-risk).
Technical note: OLS analogy avoided; canonical logit link. Multicollinearity checks; robust SEs. Underweight had no observed leaks (non-estimable OR → marked explicitly). Model intended for calibration checks and bootstrap validation in D1+.
Findings
BMI is a strong predictor.
Obese III: ~17× higher odds vs Normal BMI.
Overweight: ~5× higher odds vs Normal.
Underweight: no leaks observed → not estimable.
Other significant covariates:
Age: ~10% higher odds per additional year (p = 0.001).
Albumin: ~78% lower odds per +1 g/dL (p = 0.004).
Operative time: ~40% higher odds per +1 hour (p = 0.043).
Tobacco use and gender increased odds but were not statistically definitive in this sample.

Figure 1 — Leak risk by BMI category (odds ratios, 95% CIs)
Odds ratios (OR) of anastomotic leak risk relative to Normal BMI. No leaks observed in Underweight group → OR not estimable (denoted with an X-shaped marker)
Patient profiles (what this means at the bedside)
Healthy 35-year-old, no comorbidities: Leak risk rises with BMI but remains <1% even at Obese III. (Figure 2)
High-risk 62-year-old, smoker, diabetic: Baseline risk ~15% at Normal BMI; ~75% at Obese III. (Figure 3)

Figure 2 — Predicted leak probabilities by BMI (healthy profile)
Risk remains low across the BMI spectrum. Bars labeled with exact probabilities; values <0.1% marked as “<0.1%” with minimum bar height.

Figure 3 — Predicted leak probabilities by BMI (high-risk profile)
Risk increases substantially with higher BMI, reaching ~75% in the Obese III category. Bars labeled with exact probabilities; values <0.1% marked as “<0.1%” with minimum bar height.
If this were your org (D0 Discovery Sprint — 1–2 weeks, fixed-fee)
Purpose: De-risk scope, surface blockers, and deliver first answers—so we can make a concrete plan and quote for D1+ (the deliverables phase that follows Discovery).Length: 1–2 weeks (calendar), ~12–30 focused hours.Inputs: 30–60 min kickoff, read-only sample data, business goal(s), and must-have decisions/dates.For surgical risk modeling, that might include patient demographics (age, sex), BMI category, lab values (e.g., serum albumin), comorbidities, operative factors (duration, approach, tobacco use), and post-op leak outcomes for model training.
You get (D0 outputs):
Data Intake & Feasibility Memo — what exists, quality, gaps, risks.
Early Findings — preliminary evidence, visuals, and concise takeaways.
Analytical Approach — what I’d explore next (and alternatives), with assumptions.
Decision Brief — which decisions can be addressed with current data, and what additional inputs are needed.
D1+ Plan & Fixed Quote — scope, milestones, acceptance criteria, cost.
Pricing: Discovery Sprints are fixed fee. D1+ Delivery is scoped and quoted after D0.
Scope your D0 plan
What’s next (D1+ Delivery)
With feasibility confirmed, D1+ is the substantive project phase that turns Discovery into concrete outcomes and deliverables.For example:
Validated clinical model: internal bootstrap validation, calibration curve, Brier score; ROC/PR AUC with class-imbalance handling.
Risk calculator & nomogram: bedside and EHR-embedded versions; BMI banding preserved; explainers (what pushes risk up/down).
Decision thresholds & policies: leak-risk cutoffs tied to actions (optimization, diversion/temporary stoma consideration, ICU/step-down monitoring).
External validation plan: prospective registry or multi-site dataset; drift monitoring and re-fit cadence.
Documentation & handover: code/notebooks, model card, acceptance tests, reproducible report suitable for clinical publishing.
Project context
Type: Independent applied research. Role: Lead analyst. Designed as a reproducible analysis. For client work, your data and artifacts remain yours and are never shared externally; no PII is required for this analysis. Access is read-only
Free · no prep needed.
TIME TO VALUE: 1–2 weeks
Outcome
A defensible, data-backed hub location in south Oxnard near a major intercity transit hub that places:
~50% of Oxnard’s target population within a 30-minute walk, and
~30% of the county’s target population within a 30-minute bus ride.
This choice reflects the strong convergence of poverty and infant density and prioritizes walkability and transit access for underserved families.

Figure 1 — Proposed site & access radius (walk) + transit hub
Zoomed Oxnard map showing candidate site (white square), intercity transit hub (gray triangle), and ~30-minute walk circle.
Decision question
Where should a single diaper-distribution hub be sited to maximize equitable access—balancing poverty burden, infant population, and mobility constraints (walkability and transit)?
Approach
Compile U.S. Census indicators at the tract level: total population, infant population, poverty rate, tract area, and geography.
Use choropleth mapping to surface county-wide need patterns.
Focus analysis on Oxnard, where high poverty and infant density overlap most.
Apply K-means clustering on poverty rate, infant counts, and tract coordinates (lat/long) to identify high-need clusters.
Compute a data-weighted centroid of the top-need cluster (weights: infant population × poverty) to propose a candidate site.
Assess access via a 30-minute walk radius and 30-minute bus reach anchored on proximity to the intercity transit hub.
Technical note: Tract features standardized prior to K-means; coordinate features handled in a projected CRS for distance stability. Cluster count chosen via elbow/silhouette checks. Access metrics reported as coverage shares (target population within isochrones/thresholds).
Findings
County view: Poverty and infant density jointly highlight Oxnard as the principal locus of need.


Figure 2 & Figure 3 — County indicators (poverty & infant population)
Side-by-side shaded tract maps showing population under 200% Federal Poverty Level (Figure 2) and infant population (Figure 3). Together, they highlight a strong convergence of need in the city of Oxnard.

Figure 4 — Oxnard high-need zones (poverty + high-infant tracts)
Tracts ranking in the top 25% of infant population (blue) overlaid on a poverty-rate map. The convergence of these two indicators identifies the core neighborhoods most in need of service access.
Oxnard focus: K-means identifies a central/south Oxnard cluster as highest need.

Figure 5 — K-means clusters of need (Oxnard)
The highest-need cluster (in blue) includes central/south Oxnard neighborhoods. Tracts colored by cluster using poverty, infants, and location.
Proposed site: The weighted centroid falls in south Oxnard, adjacent to a major intercity transit hub.
Access impact:
Walkability: ~50% of Oxnard’s target population within 30 minutes on foot.
Transit: ~30% of the county’s target population reachable within 30 minutes by bus.
Equity read: The site lies inside overlapping high-infant population and high-poverty tracts, aligning service with concentrated need while improving first-mile access.
If this were your org (D0 Discovery Sprint — 1–2 weeks, fixed-fee)
Purpose: De-risk scope, surface blockers, and deliver first answers—so we can make a concrete plan and quote for D1+ (the deliverables phase that follows Discovery).Length: 1–2 weeks (calendar), ~12–30 focused hours.Inputs: 30–60 min kickoff, read-only sample data, business goal(s), and must-have decisions/dates.For equitable hub placement, that might include U.S. Census tract indicators (infant population, poverty rate, total population, geography) alongside client-specific data such as existing distribution points, partner agencies, facility constraints, and service priorities (e.g., walkability vs. transit reach).
You get (D0 outputs):
Data Intake & Feasibility Memo — what exists, quality, gaps, risks.
Early Findings — preliminary evidence, visuals, and concise takeaways.
Analytical Approach — what I’d explore next (and alternatives), with assumptions.
Decision Brief — which decisions can be addressed with current data, and what additional inputs are needed.
D1+ Plan & Fixed Quote — scope, milestones, acceptance criteria, cost.
Pricing: Discovery Sprints are fixed fee. D1+ Delivery is scoped and quoted after D0.
Scope your D0 plan
What’s next (D1+ Delivery)
With feasibility confirmed, D1+ is the substantive project phase that turns Discovery into concrete outcomes and deliverables.For example:
Facility-location optimization: max-coverage / p-median variants with capacity, budget, and equity weights; multi-site “what-if”s.
Network isochrones & GTFS transit modeling: real walk times and bus travel times (time-of-day aware), including headways and transfers.
Scenario planning: sensitivity to different weights (poverty vs infant counts), demographic growth, service hours, and safety/lighting.
Access dashboards: coverage KPIs (walk and transit), neighborhood summaries, and “who gets helped more/less” equity cut.
Stakeholder-ready artifacts: annotated maps, reproducible notebooks, model card, and a siting playbook with acceptance tests.
Project context
Type: Independent applied research. Role: Lead analyst. Designed as a reproducible analysis. For client work, your data and artifacts remain yours and are never shared externally; no PII is required for this analysis. Access is read-only
Free · no prep needed.
TIME TO VALUE: 1–2 weeks
Outcome
A credible inverse model that infers viable laser parameter sets directly from desired proton energy spectra—avoiding exhaustive brute-force search. When inferred parameters are re-simulated through the physics forward model, the target energy outputs are reproduced with ~95% accuracy across total, maximum, and average proton-energy metrics, enabling rapid design-space exploration for experiment planning.



Figure 1 — Predicted vs target energy metrics (total, max, average)
Points (in black) represent reconstructed energy values generated from inferred laser parameters; 45° blue line indicates perfect agreement; log scale for clarity.
Decision question
Can we recover laser settings that achieve specified proton energy targets quickly and defensibly—without scanning millions of configurations—and do so in a way scientists can validate and trust?
Approach
Train an Invertible Neural Network (INN) to learn a non-bijective mapping between laser parameters ↔ proton spectra using a physics-based forward model to generate paired simulations.
Use exact inversion to sample plausible parameter sets given a target spectrum, then re-simulate via the original physics model to verify outputs.
Embrace the many-to-one nature of the inverse problem by producing ensembles of solutions rather than a single point estimate.
Technical note: Coupling-layer INN with change-of-variables training; losses incorporate forward-consistency and inverse-consistency. Inputs/outputs standardized; prior bounds enforced on physical parameters. Evaluation uses re-simulation agreement.
Figure 2 — Inverse-model workflow (INN → laser parameters → physics re-sim)
The INN receives desired energy outputs and generates plausible laser parameter sets, which are validated by re-simulating through a physics-based model.
Findings
The INN reconstructs target spectra reliably when inferred parameters are re-simulated, achieving ~95% agreement across total, maximum, and average energy metrics.
The method identifies multiple valid configurations that meet the same target—useful when operational constraints (e.g., safety limits, hardware ranges) need flexibility.
Compared with brute-force searches, the inverse model materially reduces compute/search effort, enabling faster iteration in experimental planning.
If this were your org (D0 Discovery Sprint — 1–2 weeks, fixed-fee)
Purpose: De-risk scope, surface blockers, and deliver first answers—so we can make a concrete plan and quote for D1+ (the deliverables phase that follows Discovery).Length: 1–2 weeks (calendar), ~12–30 focused hours.Inputs: 30–60 min kickoff, read-only sample data, business goal(s), and must-have decisions/dates.For inverse-design modeling, that might include physics-model outputs (paired laser settings ↔ energy spectra), desired target spectra or operating regimes, and lab-specific constraints such as hardware limits, safety tolerances, and feasible parameter ranges.
You get (D0 outputs):
Data Intake & Feasibility Memo — what exists, quality, gaps, risks.
Early Findings — preliminary evidence, visuals, and concise takeaways.
Analytical Approach — what I’d explore next (and alternatives), with assumptions.
Decision Brief — which decisions can be addressed with current data, and what additional inputs are needed.
D1+ Plan & Fixed Quote — scope, milestones, acceptance criteria, cost.
Pricing: Discovery Sprints are fixed fee. D1+ Delivery is scoped and quoted after D0.
Scope your D0 plan
What’s next (D1+ Delivery)
With feasibility confirmed, D1+ is the substantive project phase that turns Discovery into concrete outcomes and deliverables.For example:
Production-grade inverse tool: INN + sampler producing parameter ensembles that satisfy target spectra under explicit physical bounds.
Optimization & constraints: multi-objective selection of candidates (e.g., energy targets + safety margins + hardware limits).
Validation suite: systematic re-simulation tests, calibration plots, acceptance tests, and drift monitoring for evolving regimes.
Integration & UX: lightweight UI/CLI for target-setting and export of candidate configurations; reproducible notebooks; model card.
Handover & documentation: code, tests, training data recipes, and an operations playbook.
Project context
Type: Collaborative applied research. Role: Lead Analyst. Designed as a reproducible analysis. For client work, your data and artifacts remain yours and are never shared externally; no PII is required for this analysis. Access is read-only
Collaboration & support: Conducted with a physics research group at The Ohio State University; supported by the National Science Foundation (NSF).
Free · no prep needed.
TIME TO VALUE: 1–2 weeks
Outcome
A nonparametric model that captures a highly nonlinear, oscillatory error pattern with growing amplitude over distance, delivering usable prediction intervals at operational ranges. This supports real-time targeting adjustments with defensible uncertainty bands and avoids misleading extrapolation beyond the validated range (0–8 km).

Figure 1 — Positioning error vs distance
Kernel regression estimates of error from 0 to 8 km (blue line). Vertical bars represent prediction intervals of positioning error at 0.1, 1.0, and 8.0 km. The shape of the model reflects the highly nonlinearity, oscillatory error pattern in the observed data (gray points).
Decision question
How does tracking error evolve with distance in fast-tracking scenarios, and can we provide distance-specific prediction intervals to guide tuning and guardrails during live operations?
Approach
Diagnose shape: exploratory plots revealed wave-like, amplitude-increasing error vs. distance and heteroskedasticity.
Fit nonparametric kernel regression (no imposed functional form) to capture oscillation and local curvature.
Quantify uncertainty at 0.1 km, 1.0 km, and 8.0 km using bootstrap resampling of residuals, producing prediction intervals at critical ranges.
Enforce no extrapolation beyond 8.0 km (training support), where uncertainty balloons.
Technical note: Local-weighted kernel smoother with bandwidth tuned by out-of-sample error; residual bootstrap for PIs; caution flags for boundary regions; results reported on the original meter scale.
Findings
Average error ≈ 0.5 m over the validated range, with distance-specific prediction intervals of tracking error:
0.1 km: −0.99 m to 0.91 m
1.0 km: −1.06 m to 0.77 m
8.0 km: 6.81 m to 9.02 m (clear growth in magnitude at range)
The oscillatory structure (growing amplitude) is captured without over-smoothing, yielding more realistic bounds than polynomial or linear fits would provide.
Operational guidance: tighten tolerances at short range; apply wider, distance-aware guardrails as range increases; avoid decisions that rely on predictions beyond 8 km absent new data.
If this were your org (D0 Discovery Sprint — 1–2 weeks, fixed-fee)
Purpose: De-risk scope, surface blockers, and deliver first answers—so we can make a concrete plan and quote for D1+ (the deliverables phase that follows Discovery).Length: 1–2 weeks (calendar), ~12–30 focused hours.Inputs: 30–60 min kickoff, read-only sample data, business goal(s), and must-have decisions/dates.For high-velocity tracking systems, that might include time-synced telemetry (estimated position/velocity), ground-truth references (range instrumentation or high-precision GPS), and operational bounds (validated range, safety constraints, error definitions).
You get (D0 outputs):
Data Intake & Feasibility Memo — what exists, quality, gaps, risks.
Early Findings — preliminary evidence, visuals, and concise takeaways.
Analytical Approach — what I’d explore next (and alternatives), with assumptions.
Decision Brief — which decisions can be addressed with current data, and what additional inputs are needed.
D1+ Plan & Fixed Quote — scope, milestones, acceptance criteria, cost.
Pricing: Discovery Sprints are fixed fee. D1+ Delivery is scoped and quoted after D0.
Scope your D0 plan
What’s next (D1+ Delivery)
With feasibility confirmed, D1+ is the substantive project phase that turns Discovery into concrete outcomes and deliverables.For example:
Production nonparametric model with distance-aware uncertainty (online inference; bounded outside training support).
Adaptive bandwidth & drift monitoring to sustain accuracy as operating conditions change.
Decision guardrails: thresholds that translate PIs into operational actions (tighten/relax tolerance windows by distance).
Calibration & validation: rolling backtests, coverage diagnostics for PIs, boundary checks.
Documentation & handover: notebooks, tests, model card, and a playbook for extending to new ranges/sensors.
Project context
Type: Independent applied research. Role: Research lead. Designed as a reproducible analysis. For client work, your data and artifacts remain yours and are never shared externally; no PII is required for this analysis. Access is read-only
Free · no prep needed.