Methodology

How MedPlan Works

A transparent account of the data ingestion pipeline, feature normalisation framework, multi-dimensional benefit scoring model, and Pareto-optimal frontier identification that sits underneath every comparison MedPlan makes.


End-to-end pipeline

Every comparison a user sees is the output of a five-stage analytical pipeline that transforms raw insurer benefit tables into a ranked, personalised decision surface.

01
Ingestion
Structured extraction of benefit schedules, excess schedules, and premium tables from all active Irish health insurers.
02
Normalisation
Per-dimension min–max normalisation and ordinal encoding of categorical benefit levels onto a common [0, 100] scale.
03
Scoring
Weighted multi-component utility function applied across multiple benefit dimensions to produce a composite benefit score.
04
Frontier
Pareto dominance analysis across the (score, premium) space to identify the non-dominated efficient frontier.
05
Personalisation
User-defined priority weight vector applied at query time to re-rank the frontier without re-running the full pipeline.

1. Data ingestion & structuring

The Irish private health insurance market is governed under the Health Insurance Acts and supervised by the Health Insurance Authority (HIA). All insurers are required to publish benefit schedules, but no standardised machine-readable format exists across providers. MedPlan's ingestion layer parses the benefit documentation published by VHI, Laya Healthcare, Irish Life Health, and any future market entrants, mapping declared benefits onto a canonical schema of benefit dimensions, each one representing a clinically meaningful coverage category.

Raw values are heterogeneous: some dimensions are quantitative continuous values (e.g. nightly excess in euros, outpatient refund in euros), some are quantitative bounded percentages (e.g. percentage coverage of a hi-tech hospital consultant fee), and some are nominal categorical flags (e.g. whether a given procedure category is covered at all). The ingestion layer assigns each raw value to one of three primitive types: YN (binary coverage flag), PCT (percentage or fraction of covered cost), and EUR (absolute euro exposure). Each type is subsequently transformed by its own normalisation function before entering the scoring model.

Plan activity status is tracked against the HIA register. Inactive plans — those no longer available for new members — are retained in the database for historical comparison but are excluded from all frontier and graphical outputs. Premium data is maintained on a rolling basis.

The accuracy of MedPlan's output is directly dependent on the accuracy of the benefit tables published by the insurers themselves. Where a benefit schedule contains an error, an omission, or a change that has not yet been reflected in the published documentation, the scoring model will reproduce that error faithfully. MedPlan reviews benefit schedules on a rolling basis and cross-references them against the HIA's published plan register, but we cannot guarantee that every plan is current at every point in time. Users are encouraged to verify material benefits directly with their insurer before switching plans.

Where a declared benefit is heavily qualified, it is treated as absent for scoring purposes. The scoring model operates on the principle that a conditional benefit — one whose availability depends on criteria that are either opaque, plan-specific, or likely to exclude a significant proportion of policyholders — does not represent the same coverage value as an unconditional one. For example, a plan that offers "private room in a private hospital at semi-private rates" will receive a score of zero for the Private Room / Private Hospital dimension, because the qualification materially undermines the stated benefit. This conservative encoding prevents plans from scoring well on coverage they do not reliably deliver.

Know of a benefit we've missed?

If an insurer offers a benefit that is not reflected in our analysis — whether because it does not appear on the published benefit schedule or because it was introduced after our last review — we want to know about it. Please use the feedback form to describe the benefit, the plan it applies to, and why you believe it should be included in the scoring model. We review all submissions and update the dataset where the evidence supports it.

2. Feature normalisation

Because dimensions carry different units and ranges, direct aggregation of raw values would produce a score dominated by high-variance features. To ensure each dimension contributes meaningfully to the composite score, a type-specific normalisation function maps all raw values onto a commensurable [0, 100] benefit score scale where higher is always better.

Hospital Tier Benefit Score HTS(β, λc, λn, ω)
HTS = max( β − λc · ω − λn · ω, 0 )

β — declared base coverage percentage for this tier (normalised to [0, 100])

λc — copayment penalty: copay_eur × 0.01, representing the proportional cost burden of per-admission excess

λn — nightly excess penalty: nightly_excess_eur × 0.15, representing the cumulative exposure over a typical admission

ω — user-defined excess sensitivity weight [0.0, 4.0]; amplifies both penalty terms when the user prioritises low-excess plans

The max(·, 0) clamp ensures the score is bounded below; a plan with very high excesses cannot contribute negative utility to a group's total score.

Procedure Shortfall Score PSS(δ, λe, ω)
δ = max( 100 − β, qpct × 100 )
PSS = max( 100 − λe · ω − δ · ω, 0 )

β — base tier coverage percentage; establishes the implicit shortfall floor: even a plan with no quoted shortfall carries a gap of (100 − β) points

qpct — the insurer-quoted shortfall percentage for the specific procedure category, if disclosed

δ — effective shortfall: the greater of the structural tier gap and the quoted shortfall. This prevents double-penalisation and correctly represents that the base-coverage gap is a lower bound, not an additive term

λe — euro-exposure penalty: shortfall_amount_eur × shortfall impact factor, representing the expected out-of-pocket cost normalised to the [0, 100] score range

ω — the same excess sensitivity weight as above, ensuring consistent user-priority amplification across all excess-related dimensions

This piecewise normalisation approach — as opposed to global min–max scaling — is deliberate. Global normalisation would tie each plan's score to the extremes of the current market: if a new market entrant introduced an unusually high nightly excess, all existing plans would be rescaled upward. The per-formula approach ensures that a plan's score reflects its absolute coverage quality, not its ranking relative to a market distribution that changes each calendar year.

3. Composite benefit score & priority weights

Each normalised dimension score is assigned to one of seven benefit groups: Public Hospital Access, Private Hospital Access, Hi-Tech & Specialist Hospital Access, Specialist Procedures & Cardiac, Maternity & Fertility, Day-to-Day Outpatient, and Excess & Shortfall Exposure. The composite benefit score S for a plan p is computed as a weighted sum of the group-level subscores:

Composite Score
S(p, w) = Σg ∈ G ωg · scoreg(p)

w — the user priority weight vector, one component per benefit group

ωg — the scalar weight for group g, resolved from the user's 1–5 priority slider via a non-linear mapping

scoreg(p) — the group subscore, itself a weighted mean of the individual dimension scores within group g

Priority sliders accept ordinal inputs from 1 to 5, which are mapped onto a strictly convex weight schedule {1→0.0, 2→C₁, 3→C₂, 4→C₃, 5→C₄}. The convexity of this schedule is deliberate: the marginal impact of moving from priority 4 to 5 is four times that of moving from 2 to 3. This reflects the real-world decision structure — a user with a high-risk maternity profile needs the model to strongly surface maternity cover, not just modestly prefer it. The non-linear schedule avoids the saturation and insensitivity that a linear mapping would produce at the extremes.

A user who leaves all sliders at the default (3) receives a weight vector of all-ones, equivalent to an unweighted mean across the benefit space. This represents a neutral prior over benefit categories — appropriate when the user has no strong clinical or lifestyle preference — and ensures that the default output reflects genuine broad-coverage value rather than any particular category optimisation.

4. Score decomposition & explainability

The composite score is designed to be fully decomposable. For any pair of plans (a baseline plan and a candidate plan), MedPlan can compute a per-dimension contribution table that shows the raw benefit value, the normalised subscore, the applied weight multiplier, and the signed delta between the two plans for every dimension. This score breakdown is accessible directly from the comparison interface.

The contribution table serves a dual purpose. Firstly, it enables users to verify that the model is weighting their stated priorities correctly — a form of post-hoc interpretability analogous to SHAP (SHapley Additive exPlanations) in supervised learning contexts. Secondly, it surfaces the specific dimensions where one plan dominates another, giving users a defensible basis for switching rather than a black-box recommendation.

Score sensitivity to weight perturbation is implicitly tested by the priority sliders: because the model is re-evaluated at query time against live weight parameters, users can directly observe how much their final ranking changes in response to a unit shift in any benefit group weight — a lightweight form of parameter sensitivity analysis built into the UX.

5. Pareto-optimal frontier identification

After scoring, the full set of active plans is projected into a two-dimensional decision space defined by the axes (annual premium, composite benefit score). Within this space, MedPlan applies a Pareto dominance criterion: plan A dominates plan B if and only if A costs no more than B and scores at least as high as B, with at least one strict inequality. The set of all non-dominated plans forms the efficient frontier — also known in multi-objective optimisation as the Pareto front — and constitutes the primary output of the Best Plans view.

Formally: a plan p is on the frontier if there exists no other plan q such that cost(q) ≤ cost(p) ∧ S(q) ≥ S(p) with at least one strict inequality. The computation is O(n²) in the number of active plans, which is tractable at the current market scale (approximately 300 active plans) and is re-executed on every request to ensure the frontier reflects the user's live weight vector.

A key property of the Pareto front is that it is weight-agnostic at the structural level: there is no objectively correct way to trade off premium against coverage without a utility function. MedPlan's priority weights define that utility function. Changing the weights re-scores all plans and therefore changes the shape of the frontier — a plan that appears dominated under neutral weights may become non-dominated when a specific benefit group is given high priority, if it scores disproportionately well in that group. The frontier is therefore a preference-conditional optimal set, not an absolute ranking.

Frontier plans are further stratified into benefit tiers based on their composite score relative to the maximum achievable score in the current market. This tiering provides an intuitive entry point for users who want to anchor their choice to a rough budget level before reading the detailed frontier breakdown.

6. Scatter plot & symlog spatial transform

The scatter plot places every plan relative to a user-chosen baseline using a relative coordinate system: a plan's X position represents how much cheaper or more expensive it is per year versus the baseline (in euros), and its Y position represents the difference in composite benefit score. The baseline plan is always at the origin (0, 0).

Premium differences are projected onto the X-axis using a symmetric logarithmic (symlog) transform:

Symlog transform
symlog(v) = sgn(v) · log10( 1 + |v| / C )

v — raw annual premium difference (€) between the candidate and baseline plans

C — linear threshold constant (C = €100); within ±€100 of the baseline, the scale is approximately linear, preserving fine resolution around zero

Beyond ±C, the scale transitions to logarithmic compression, preventing large premium differences from dominating the visual space and obscuring the dense cluster of plans near the user's current price point. The transform is monotone and preserves sign, so the Sweet Spot quadrant (top-right) always represents plans that are simultaneously cheaper and better-scoring.

The Y-axis does not require a non-linear transform because score differences are already bounded: the maximum possible Y displacement equals the total composite score range (approximately 0–500 raw score units, displayed after normalisation). Focus view mode further clips the rendered range to plans within €300/yr of the baseline and within 30 score points below it, reducing visual clutter and ensuring the most actionable plans are prominent.

By the numbers

N
benefit dimensions scored
7
benefit groups
300+
active plans analysed
4
Irish insurers covered
O(n²)
frontier algorithm complexity
5
priority levels per group

Limitations & design constraints

MedPlan scores plans on the basis of structured benefit data. It intentionally excludes several factors that are genuinely important to some users but are either impossible to quantify consistently across all plans or are highly individual in nature:

  • Network adequacy — the geographic distribution of consultants and facilities that accept a given plan is not modelled. A plan with high theoretical cover for a specific procedure may have limited in-network providers in a given region.
  • Claims experience — insurer claims-handling quality, approval rates for specific procedures, and pre-authorisation requirements are not captured in published benefit schedules and are not part of the scoring model.
  • Wellness and minor benefits — the model deliberately under-weights ancillary wellness benefits (gym contributions, dental discounts, minor day-care items) relative to major clinical categories. A plan with a generous optical benefit but limited private hospital access will correctly score lower than one that prioritises the reverse.
  • Individual health history — the model applies population-level benefit weights. It cannot and does not account for a user's specific medical history, ongoing treatment requirements, or the likelihood of needing any particular benefit category.

These are deliberate scope constraints, not oversights. The design principle is that MedPlan should be a high-quality first filter i.e. one that removes the dominated options and surfaces the plans worth examining. They are not a definitive buy recommendations for any plan. The final decision should involve reading the plan benefit schedule for any shortlisted plan and, where appropriate, consulting a licensed health insurance broker, and/or talking to the provider.


See it working on your plan

Pick your current plan and watch the algorithm place every plan on the market relative to yours in real time.