NNI MA HCP Screener — Documentation

Overview

A stateless REST API that takes a single convention-booth attendee and tries to identify them as a true Healthcare Professional (HCP) — returning degree and specialty when possible.

The upstream pipeline has two earlier identification stages: Level 1 matches against a pharma customer database (CDMI), and Level 2 attempts a direct NPI Registry exact match. This screener is Level 3 — the fallback when both upstream tiers fail.

What "HCP" means here

Any licensed clinician identifiable via degree (MD / DO / DDS / DMD / DPM / DPT / PharmD / DNP / OD) or credential string (RN / NP / PA / CNM / CRNA / RPh, etc.). NPI is a strong corroborating signal but not a hard requirement — research nurses, educators, and pharmacists in non-billing roles can be HCPs without being enumerated in NPPES.

Endpoint

POST https://nni-ma-hcp-screener.netlify.app/api/screen

Also reachable directly at POST /.netlify/functions/screen (the /api/screen path is a Netlify redirect).

Stateless by design. The screener stores nothing. No call log, no cache, no audit trail. Each request is independent.

Quickstart

Identify a single attendee with one HTTP call.

curl -X POST "https://nni-ma-hcp-screener.netlify.app/api/screen" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY_HERE" \
  -d '{
        "firstName":      "Ilene",
        "lastName":       "Friedman",
        "state":          "NY",
        "city":           "Northport",
        "postalCode":     "11768",
        "specialty_hint": "Internal Medicine"
      }'

const res = await fetch('https://nni-ma-hcp-screener.netlify.app/api/screen', {
  method:  'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key':    'YOUR_API_KEY_HERE'
  },
  body: JSON.stringify({
    firstName:      'Ilene',
    lastName:       'Friedman',
    state:          'NY',
    city:           'Northport',
    postalCode:     '11768',
    specialty_hint: 'Internal Medicine'
  })
});
const data = await res.json();
console.log(data.match.status, data.match.confidence, data.match.npi);

import requests

resp = requests.post(
  "https://nni-ma-hcp-screener.netlify.app/api/screen",
  headers={
    "Content-Type": "application/json",
    "X-API-Key":    "YOUR_API_KEY_HERE",
  },
  json={
    "firstName":      "Ilene",
    "lastName":       "Friedman",
    "state":          "NY",
    "city":           "Northport",
    "postalCode":     "11768",
    "specialty_hint": "Internal Medicine",
  },
  timeout=30,
)
data = resp.json()
print(data["match"]["status"], data["match"]["confidence"], data["match"].get("npi"))

A successful response (HTTP 200) wraps the result in a small envelope:

{
  "ok": true,
  "match":      { "status": "matched", "confidence": 0.95, ... },
  "alternates": [ ... ],
  "tiers_run":  ["A"],
  "input_echo": { "firstName": "Ilene", "lastName": "Friedman", "state": "NY", "city": "Northport" }
}

Authentication

Every request must carry an X-API-Key header.

X-API-Key: 449ba0...e74650ebad948

Without the header (or with a wrong value) the function returns HTTP 401 with { "ok": false, "error": "Invalid or missing X-API-Key header." }.

The key is stored as the SCREENER_API_KEY env var on the Netlify site. To rotate: generate a new value, PUT it to the same env var via the Netlify API, redeploy, notify upstream.

This API key gates Serper credit consumption. Treat it as a budget control, not just an authentication token. Every successful call past Tier A spends Serper credits.

Request schema

JSON body, application/json. Only first + last name are strictly required. The richer the input, the better the disambiguation.

Field	Required?	Type / format	Purpose & normalization
firstName	Required	string	Lowercased + whitespace-collapsed. Trailing credentials are auto-stripped (`"Anthony MD"` → `"Anthony"`, credentials captured separately).
lastName	Required	string	Same normalization. Hyphens preserved; splits into variants in Tier A (`"Smith-Jones"` → `["Smith-Jones", "Smith Jones", "Smith", "Jones"]`).
middleName (or `middleInitial`)	Optional	string	Disambiguation aid; passed to the LLM enrichment context. (The web console makes this a required field — enter "none" if none.)
organization (or `employer`)	Recommended	string	Strongest disambiguator for non-HCPs. Drives LinkedIn- and employer-site-led Tier B queries and the LLM enrichment. Also substring-matched against NPI practice-address line 1 (`+0.08` on hit). The web console requires it (enter "none" if unknown).
suffix	Optional	string	Jr / Sr / III / etc. Currently echoed in the input.
email	Optional	string	Lowercased. Domain used as a corroborating employer signal.
city	Optional	string	Used as a geo signal. Adds `+0.10` to Tier A score on match. Also used by Tier B classifier to disambiguate namesakes in different cities.
state	Optional	2-letter code	Uppercased + trimmed. Filters initial Tier A passes. Adds `+0.20` on match, subtracts `−0.25` on mismatch. Tier B classifier uses it for geo alignment.
postalCode (or `zip`)	Optional	5-digit string	Non-digits stripped, sliced to first 5. Adds `+0.10` on exact match against NPI practice ZIP.
specialty_hint (or `specialtyHint`)	Optional	string	Substring-matched (either direction) against NPI primary taxonomy description. Adds `+0.15` on match, `−0.03` on soft mismatch.
convention	Optional	string	Free-form metadata. Currently not used in scoring; preserved for future analytics.

Minimum vs. recommended input

Minimum: { "firstName": "...", "lastName": "..." } — works but low-confidence (no geo signals).

Recommended: First + Last + postalCode + state + city + organization (+ specialty_hint when known). For common names — and for non-HCPs at small specialized meetings with no badge data — organization is the single most valuable disambiguator.

Response schema

Every successful response shares a common envelope. Status-specific fields nest under match.

{
  "ok": true,
  "match": {
    "status":     "matched" | "inconclusive" | "not_hcp" | "unknown",
    "confidence": 0.0 - 1.0,
    "source":     "npi_direct" | "npi_fuzzy" | "npi_fuzzy_spell_corrected" | "tier_b_web_search",
    "tier":       "A" | "B",
    // ... status-specific fields below ...
    "match_reasoning": "human-readable string explaining why this disposition"
  },
  "alternates": [ /* up to 4 runner-up matches */ ],
  "tiers_run":  ["A"] | ["A", "B"] | ["A", "B", "A-retry"],
  "input_echo": { "firstName": "...", "lastName": "...", "state": "...", "city": "..." }
}

Source values

Source	Meaning	Trust
`npi_direct`	NPI Registry returned this person on a literal first+last variant. No fuzzy expansion needed.	Highest
`npi_fuzzy`	NPI Registry match via a name variant (nickname, hyphen split, wildcard). Still NPI-authoritative.	High
`npi_fuzzy_spell_corrected`	Input had a typo; Tier B suggested a correction; Tier A retry with the corrected spelling hit. Returns the corrected NPI record.	High
`tier_b_web_search`	No reliable NPI hit; classification came from web search results. Used for non-NPI HCPs (research nurses, etc.) and for `not_hcp` dispositions.	Lowest — adjudicate

Match-object field reference

Field	Type	Where it comes from	Notes
`status`	string	Disposition logic	`matched / inconclusive / not_hcp / unknown`
`confidence`	0.0–1.0	Scoring	≥0.85 high · 0.65–0.84 medium · <0.65 inconclusive
`source`	string	Pipeline branch	`npi_direct / npi_fuzzy / npi_fuzzy_spell_corrected / tier_b_web_search`
`tier`	"A" \\| "B"	Pipeline branch	Which tier produced the match
`npi`	string \\| null	NPPES	10-digit NPI; null for Case-3 non-NPI HCPs
`name_prefix`	string \\| null	NPPES `basic.name_prefix`	e.g. "DR."
`firstName / middleName / lastName / name_suffix`	strings	NPPES	Raw NPPES casing
`credential`	string \\| null	NPPES	e.g. "M.D.", "RN", "PA-C"
`gender`	"M" \\| "F" \\| null	NPPES	Self-reported
`job_title`	string \\| null	Composite — credential + taxonomy	e.g. `"Internal Medicine, M.D."`. Synthesized; not a raw NPPES field.
`job_description`	string \\| null	NPPES taxonomy desc	Same value as `primary_taxonomy.name`, surfaced as a friendly alias.
`primary_taxonomy`	object	NPPES	`{ code, name, license, license_state }`
`secondary_taxonomies`	array	NPPES	Multi-specialty providers' additional taxonomies
`practice_address`	object	NPPES address with `purpose=LOCATION`	`{ street, street_2, city, state, zip, phone, fax, formatted }`
`mailing_address`	object \\| null	NPPES address with `purpose=MAILING`	Same shape as practice_address; null if same as practice
`additional_practice_locations`	array	NPPES `practiceLocations`	Multi-site practitioners
`place_of_employment`	object \\| null	Tier B inference	`{ name, source, evidence_url }` — extracted from web snippets matching "at <Org>" / LinkedIn patterns. `source: "tier_b_web_inference"` always — never authoritative.
`job_title_inferred`	object \\| null	Tier B inference	Same shape; only set when LinkedIn-style "Title · Org" pattern fires.
`enumerated_since`	YYYY-MM-DD	NPPES	When the NPI was issued
`last_updated`	YYYY-MM-DD	NPPES	When the NPPES record was last touched
`match_signals`	array of strings	Scoring	Which Δ-score signals fired
`match_reasoning`	string	Function	Human-readable explanation
`flags`	array	Function	e.g. `["NOT_IN_NPI_REGISTRY"]`, `["ROLE_NON_CLINICAL"]`
`spell_corrected`	object \\| absent	Case 3b only	`{ from, to }`
`tier_b_evidence`	array	Tier B classifier	Up to 3 supporting web hits (Cases 1, 2, 3)
Enrichment fields (v0.3.x) — derived after Tier A + Tier B by the LLM layer
`is_hcp`	boolean	NPPES / LLM	HCP facet. `true` if the person holds a real clinical credential or practices clinically (definitive on an NPI/credential match), regardless of their current job title. Independent of `is_executive`.
`is_executive`	boolean	LLM	Executive facet. `true` for a leadership / administrative / business role (CEO, VP, Director, Dean, Chief Nursing Officer, …). Independent of `is_hcp` — a credentialed executive (e.g. a Chief Nursing Officer who is an RN/DNP) is both `true`.
`current_employment` + `current_employment_source`	string \\| null	Tier B regex / LLM	Current employer. Prefers the NPPES/Tier-B inferred `place_of_employment.name` (`_source: "tier_b_web_inference"`); for non-HCPs with no such match, falls back to the LLM-extracted employer (`"llm_generated:<model>"`) — e.g. a CEO's company. Null when no employer is evident.
`current_position` + `current_position_source`	string \\| null	LLM (employer portal) / NPPES	Current job title/role. The LLM sources it from the employer's own official website / leadership page first (then curated directories / LinkedIn, then press), using one clean canonical title and never merging stale or multi-source titles — e.g. "Group Senior Vice President, Chief Nursing Executive". A bare NPPES credential (e.g. "DNP") is never used as a role.
`highest_education` + `highest_education_source`	string \\| null	NPPES credential / LLM	Highest level of education / terminal degree, e.g. `"Doctor of Medicine (M.D.)"`. Deterministic from a clear NPPES credential (`_source: "nppes_credential"`); otherwise LLM-inferred from web evidence (`"llm_generated:<model>"`) — e.g. a "Professor of Medicine" → "Doctorate (M.D. and/or Ph.D.)".
`position_function` + `position_function_source`	string \\| null	LLM	1–2 sentence plain-language explanation of what someone in this position/specialty typically does, for a non-clinical reader. `_source: "llm_generated:<model>"`. Null when no position was identified or the LLM layer is unavailable.
`call_recommendation` + `call_recommendation_source`	object \\| null	LLM	`{ should_call, category, rationale, confidence }` — whether NNI medical affairs should call on this person even if they are not an HCP — e.g. a professor of medicine, dean / department chair, KOL / thought leader, podcaster or online influencer, P&T / formulary committee member, professional-association member, patient-advocacy NGO / non-profit member, researcher, or payer. See the category enum below. `_source` carries provenance or an `unavailable: <reason>` note.

NPPES has no native "employer" or "job title" fields. The place_of_employment, current_employment, and job_title_inferred objects come from Tier B web search and are best-effort. Treat them as hints, not authoritative facts. The job_title / job_description / current_position fields are synthesized from NPPES credential + taxonomy and are deterministic for any NPI match.

position_function and call_recommendation are LLM-generated (added v0.3.0) from a single live model call per request. They are advisory judgments, not facts — call_recommendation in particular is a triage hint for whom to approach, not a vetted determination. The screener degrades gracefully: if the LLM layer is unconfigured or fails, both fields return null with a reason in *_source and the rest of the result is unaffected.

call_recommendation.category enum: HCP, KOL/Influencer, P&T/Formulary Committee, Patient Advocacy/NGO, Professional Association, Researcher/Academic, Payer/Managed Care, Internal/Own-Company, Industry/Other, Not Recommended, Unknown.

Status values

matched — a single identity meets the confidence threshold. Cases 0, 1, 3, 3b, and 4 produce this.
inconclusive — a top candidate exists but its score is below the 0.65 match threshold. alternates are returned for human review.
not_hcp — Tier B found clear non-clinical role signals (Sales, Marketing, Realtor, etc.). NPI hits, if any, appear to be different individuals.
unknown — nothing found in either tier. raw_hits from Tier B (if any) are returned for adjudication.

The 6 disposition cases

The function evaluates branches in order. The first that matches wins.

matched Case 0 — Tier A high-confidence

NPI Registry returns a candidate with score ≥ 0.85 AND ahead of the runner-up by at least 0.15. Tier B still runs in parallel to populate place_of_employment from web evidence — that's why tiers_run reports ["A", "B"] even on high-confidence Tier A matches.

{
  "ok": true,
  "match": {
    "status":     "matched",
    "confidence": 0.95,
    "source":     "npi_direct",
    "tier":       "A",
    "npi":        "1992894257",
    "name_prefix":"DR.",
    "firstName":  "ILENE", "middleName": "LAUREN", "lastName": "FRIEDMAN",
    "credential": "M.D.",
    "gender":     "F",
    "job_title":       "Internal Medicine, M.D.",
    "job_description": "Internal Medicine",
    "primary_taxonomy": {
      "code":         "207R00000X",
      "name":         "Internal Medicine",
      "license":      "198905",
      "license_state":"NY"
    },
    "secondary_taxonomies": [],
    "practice_address": {
      "street":    "79 MIDDLEVILLE RD",
      "street_2":  null,
      "city":      "NORTHPORT",
      "state":     "NY",
      "zip":       "11768",
      "phone":     "631-261-0050",
      "fax":       "631-754-3017",
      "formatted": "79 MIDDLEVILLE RD, NORTHPORT, NY 11768"
    },
    "mailing_address":               null,
    "additional_practice_locations": [],
    "place_of_employment": {            // present only when Tier B inferred one
      "name":         "Huntington Hospital",
      "source":       "tier_b_web_inference",
      "evidence_url": "https://www.huntingtonhospital.org/..."
    },
    "job_title_inferred": null,         // populated when LinkedIn-style "Title - Org" pattern hits
    "enumerated_since":   "2006-10-12",
    "last_updated":       "2024-08-15",
    "match_signals":      ["state_match", "city_match", "zip_match", "specialty_match"],
    "match_reasoning":    "Tier A match via variant \"exact:Ilene Friedman/NY\". Signals: state_match, city_match, zip_match, specialty_match."
  },
  "alternates": [],
  "tiers_run":  ["A", "B"]
}

matched Case 1 — Tier A medium + Tier B corroborates

NPI score is 0.65 ≤ x < 0.85, and the Tier B classifier confirms HCP signals (degree mentions like "MD" in name-verified, geo-aligned snippets).

{
  "ok": true,
  "match": {
    "status":     "matched",
    "confidence": 0.75,
    "source":     "npi_direct" | "npi_fuzzy",
    "tier":       "A",
    /* full NPI record fields */
    "tier_b_evidence": [ /* up to 3 corroborating web hits */ ],
    "match_reasoning": "Tier A direct match (0.75); Tier B corroborates HCP via web hits (degree=MD)."
  },
  "alternates": [ ... ],
  "tiers_run":  ["A", "B"]
}

not_hcp Case 2 — Tier B finds non-clinical role

Tier B's classifier hits at least one non-clinical role signal (Sales, Marketing, Realtor, VP, Attorney, etc.) and the NPI candidates appear to be different individuals (geographic or context mismatch).

{
  "ok": true,
  "match": {
    "status":     "not_hcp",
    "confidence": 0.6,
    "source":     "tier_b_web_search",
    "tier":       "B",
    "tier_a_hits": 1,
    "flags":      ["ROLE_NON_CLINICAL"],
    "match_reasoning": "Tier B found 1 non-clinical role signal(s) in web hits. Tier A NPI hits (1) appear to be different individuals; geographic or context mismatch."
  },
  "alternates": [],
  "tiers_run":  ["A", "B"]
}

Confidence formula: min(0.85, 0.5 + 0.1 × negative_signals).

matched Case 3 — Tier B finds HCP, no NPI

The 2026-05-18 broadened scope. Tier A couldn't place the person in NPPES, but Tier B web hits show a clear clinical degree/credential signal in geo-aligned, name-verified snippets.

Common for non-billing clinical roles: research nurses, educators, formulary pharmacists, administrators.

{
  "ok": true,
  "match": {
    "status":     "matched",
    "confidence": 0.78,
    "source":     "tier_b_web_search",
    "tier":       "B",
    "npi":        null,
    "firstName":  "...", "lastName": "...",
    "credential": "RN" | "PA" | "MD" | "...",
    "degree":     "RN" | "PA" | "MD" | "...",
    "tier_b_evidence": [ ... ],
    "flags":      ["NOT_IN_NPI_REGISTRY"],
    "match_reasoning": "Tier A could not place this name confidently in NPI Registry (top score 0.40). Tier B web hits show clear clinical degree/credential signal — likely an HCP not enumerated in NPI."
  },
  "alternates": [],
  "tiers_run":  ["A", "B"]
}

matched Case 3b — Spell-correction retry

Tier B's classifier detects that the last-name token most consistent with the corroborating hits is a Levenshtein-distance-1+ correction of the input. The function then re-runs Tier A with the corrected spelling — and if that returns a confident NPI hit, we surface the corrected record.

This is the deepest pipeline branch: tiers_run: ["A", "B", "A-retry"].

{
  "ok": true,
  "match": {
    "status":     "matched",
    "confidence": 0.78,
    "source":     "npi_fuzzy_spell_corrected",
    "tier":       "A",
    /* full NPI record from the corrected query */
    "spell_corrected": { "from": "Moscaritols", "to": "Moscarito" },
    "match_reasoning": "Input lastName \"Moscaritols\" appears to be a typo. Tier B web search surfaced \"Moscarito\" as a likely correction; Tier A re-run with the corrected spelling produced a confident NPI match."
  },
  "tiers_run": ["A", "B", "A-retry"]
}

matched Case 4 — Tier A medium alone

Same shape as Case 1, but Tier B either ran without corroborating or wasn't available (no provider configured / out of credits). The reasoning explicitly says so.

{
  "match": {
    "status":     "matched",
    "confidence": 0.65,
    "source":     "npi_direct",
    "tier":       "A",
    /* full NPI record */
    "match_reasoning": "Tier A direct match at medium confidence (0.65). Tier B did not corroborate."
  }
}

inconclusive Case 5 — Below match threshold

A top candidate exists but its score is < 0.65. alternates are returned for adjudication; raw_hits from Tier B (up to 5) too.

{
  "match": {
    "status":     "inconclusive",
    "confidence": 0.45,
    "tier":       "A",
    "match_reasoning": "Tier A produced a candidate at score 0.45 (below the 0.65 match threshold). Tier B did not provide a clear signal."
  },
  "alternates": [ ... ],
  "raw_hits":   [ ... ]
}

unknown Case 6 — Nothing found

Tier A produced zero candidates after all 5 passes, and Tier B (if available) returned no name-verified credential signals.

{
  "match": {
    "status":      "unknown",
    "confidence":  0,
    "tiers_tried": ["A", "B"],
    "variants_tried": [ "exact:Foo Bar/NY", "variant:Foo Baz/NY", ... ],
    "match_reasoning": "No matches in Tier A (NPI Registry, 7 variants tried). Tier B returned 0 hits, none with clear HCP credential signal."
  },
  "alternates": [],
  "raw_hits":   []
}

Tier A — NPI Registry cascade

A 5-pass progressive expansion against NPPES (the public CMS API at npiregistry.cms.hhs.gov). Free, no auth.

Name normalization (pre-cascade)

Diacritic strip — José → Jose, Müller → Muller via NFD + combining-mark removal.
Whitespace collapse + trim — runs of spaces become single spaces; leading/trailing punctuation stripped.
Suffix & credential stripping — trailing tokens matching the suffix/credential vocab are removed before lookup:
- Suffixes: jr, sr, ii, iii, iv, v
- Degrees: md, do, dds, dmd, dpm, dpt, pharmd, dnp, phd, dvm, od
- Credentials: rn, aprn, crna, cnm, np, pa-c, pa, rph, rrt, rdn, lcsw, msw
- Academic: mba, msc, ms, ma, ba, bsn, msn, dnsc, edd, jd
- Fellowships: facs, facp, fasco, faan, faonl, nea-bc, ne-bc, ahn-bc, ccrn, cphq
Dotted forms (M.D.) are normalized first by stripping periods.

Nickname expansion

A bidirectional table of ~110 canonical names with their common nicknames. The lookup walks both directions — input "Bob" expands to {Bob, Robert, Bobby, Robbie, Bert}; input "Robert" expands to the same set. Both male and female forms covered.

Sample of the nickname table (click to expand)

Canonical	Variants
Robert	Rob, Bob, Bobby, Robbie, Bert
William	Will, Bill, Billy, Willie, Liam
Richard	Rick, Dick, Rich, Richie, Rickey
Michael	Mike, Mikey, Mick, Mickey
James	Jim, Jimmy, Jamie, Jay
John	Jack, Johnny, Jon
Anthony	Tony, Ant
Elizabeth	Liz, Beth, Betty, Eliza, Lizzie, Ellie, Libby, Betsy, Bess
Margaret	Maggie, Meg, Peggy, Margie, Madge, Greta
Catherine	Cathy, Kate, Katie, Kit, Cat
Reynaldo	Rey, Ronnie, Naldo

Full table at src/lib/name_variants.js lines 6–149.

Last-name variants

Hyphenated last names fan out:

"Smith-Jones" → ["Smith-Jones", "Smith Jones", "Smith", "Jones"]

Each variant is tried as a separate NPPES query (NPPES wildcard-suffixes every query automatically — see "wildcard trick" below).

NPPES wildcard trick

NPPES's default search does exact matches on first_name and last_name. That misses compound first names ("SUSAN ANABELLE" stored as one field). The function appends a trailing * to both fields on every query, which catches compounds without breaking exact matches.

The 5-pass cascade

Exact name + state filter — the cheapest, most precise pass.

Label: exact:<First> <Last>/<ST>

Tier A · pass 1

Nickname × last-name variants, with state filter. Cross-product of first-name expansions × hyphen-split last names.

Label: variant:<Nick> <LastVariant>/<ST>

Tier A · pass 2

Drop state filter — broader net. Only fires if a state hint was given AND no candidate so far scored ≥ 0.80.

Label: nostate:<First> <Last>

Tier A · pass 3

First-initial + last-name + state. Catches badge data with just "R. Rivera" instead of "Reynaldo Rivera".

Label: initial:R* <Last>/<ST>

Tier A · pass 4

First/last transposition + state. Some badge data has the fields swapped — try first=<Last>, last=<First>.

Label: swap:<Last> <First>/<ST>

Tier A · pass 5

Early exit: if pass 1 produces any candidate with score ≥ 0.90, passes 2–5 are skipped entirely. Most well-formed inputs short-circuit here.

Tier B — Web search + classifier

Provider-agnostic search layer (Brave / Serper / Google CSE). Used to corroborate Tier A medium-confidence matches and to identify HCPs that aren't in NPPES.

Provider selection

Driven by two env vars on the Netlify site:

WEB_SEARCH_PROVIDER = "serper" | "brave" | "google_cse"
WEB_SEARCH_API_KEY  = "..."
GOOGLE_CSE_ID       = "..."   // only for google_cse

If neither is set, Tier B returns { available: false, reason: "..." } and the function falls through to a Tier-A-only response.

Currently configured: Serper.dev (paid plan, 50k queries/month).

The 5 queries Tier B fires per attendee

Tightest: "<First Last>" "<City>" <ST> — name + quoted city + state. Catches local non-clinical context too (deliberately broad to detect "not_hcp").
Name + city + clinical-role terms: "<First Last>" "<City>" <ST> doctor physician nurse pharmacist
Name + employer + specialty: "<First Last>" <employer> <specialty> (only if those fields were provided)
LinkedIn fallback: "<First Last>" "<City>" linkedin
Typo tolerance: <First Last> <ST> doctor physician nurse pharmacist — unquoted, lets Google auto-spell-correct.

Queries are deduplicated and any whose body is essentially just the name are dropped. The provider returns up to 5 hits per query.

Classifier — `classifyHits()`

Up to the first 15 aggregated hits are inspected. For each, the title + snippet are scanned for three regex patterns:

Positive degree pattern

/\b(MD|D\.O\.|DO|DDS|DMD|DPM|DPT|PharmD|DNP|PhD|DVM|OD)\b/

Positive credential pattern

/\b(RN|APRN|CRNA|CNM|NP|PA-C|PA|RPh|RRT|RDN|NEA-BC|FAAN|FACS|FACP)\b/

Negative-role pattern

/\b(VP|Vice President|CEO|CFO|COO|Founder|Investor|Sales|Marketing|
       Realtor|Real Estate|Attorney|Lawyer|Tenant|Tenants Association|
       Activist|Politician|Councilman|Mayor)\b/i

Name verification — Levenshtein, both names required

Every positive hit must have both the input firstName and lastName fuzzy-matched to tokens in the snippet via Levenshtein distance. This was added 2026-05-18 after two false-positive incidents:

"Kevin Weller / Jersey City NJ" — was matching a snippet about "Kevin John Weller, APRN" in New Hampshire (different state).
"Joyce Moscaritols" (typo) — was matching "Dr. Michael Joyce, MD" because the input firstName "Joyce" appeared as someone else's last name.

Distance threshold per token: min(3, max(1, ceil(needle.length × 0.30))). So a 6-letter name allows up to 2 edits; a 10-letter name allows up to 3.

Geographic alignment

Each name-verified credential snippet is tagged:

aligned — the input state code OR full state name appears in the snippet, OR the input city appears in the snippet
mismatched — state codes appear in the snippet, but none match the input state
unknown — no state info detected

State-code matching uses comma/space context ("City, ST" patterns) to avoid false positives like "MS Excel."

Decision logic

if (neg > 0 && name_matched_aligned === 0)        → isHCP: false
else if (name_matched_aligned ≥ 1)                  → isHCP: true
else if (name_matched_count ≥ 2 && no mismatch
                            && no negatives)        → isHCP: true
else                                                → isHCP: false

The classifier also returns a spell_corrected_last_name field whenever the name-verified last-name match has distance > 0 — i.e., the snippet's lastName isn't identical to the input's. This is what triggers the Case 3b retry.

LLM enrichment layer (added v0.3.0)

After Tier A + Tier B settle on a disposition, a single live LLM call enriches the result with two judgment fields: position_function and call_recommendation. Runs on every disposition — including not_hcp and unknown — because the call recommendation is most useful for non-HCPs.

What it produces

highest_education — highest level of education / terminal degree. Deterministic from a clear NPPES credential (M.D., PharmD, DNP, Ph.D., …); otherwise inferred from web evidence (e.g. "Dean of the School of Medicine" → a doctorate). null only when there's no signal.
position_function — a 1–2 sentence plain-language explanation of what the identified position/specialty/credential typically does (core role & responsibility), for a non-clinical reader. null when no position was identified.
call_recommendation — { should_call, category, rationale, confidence }. Confirmed HCPs return should_call: true, category: "HCP". For non-HCPs, the model judges from the web evidence whether the person is still a valuable medical-affairs contact in the diabetes/obesity/cardiometabolic space — a professor of medicine, dean / department chair, KOL / thought leader, podcaster / online influencer, P&T or formulary committee member, professional-association member, patient-advocacy NGO/non-profit member, researcher/academic, or payer/managed-care. A current Novo Nordisk / NNI employee returns category: "Internal/Own-Company" (own-company colleague, not an outreach target). Clearly unrelated roles or namesakes return should_call: false, category: "Not Recommended".
social_profiles — an array of the person's reliably-attributed digital profiles: [{ platform, url, handle, followers, reliable, basis }]. Only profiles corroborated to this person (by name + employer/location/specialty) are included; likely namesakes are excluded. followers is populated only when a snippet states it.
dol — Digital Opinion Leader assessment: { is_dol, tier, topics, rationale, confidence } where tier ∈ none | emerging | established | leading. Grades digital influence in the diabetes/obesity/cardiometabolic space from the reliable profiles + evidence (audience size, posting cadence, podcast/video hosting, conference presence). A plain profile with no influence signal stays tier: "none".
identity_ids — resolved cross-reference ids { cdm_id, ims_id, npi_id } from the OSP identity service (looked up by last name + first name + ZIP). Any field may be null. The npi_id here independently corroborates the Tier-A NPI match.
prescribing_data — IQVIA-style prescribing for NNI cardiometabolic products, fetched by ims_id: { source, ims_id, total_trx, total_nbrx, prescribed[], products[] }. Any cardiometabolic prescribing (total_trx > 0) is treated by the recommendation gate as a direct in-scope tie — so a clinician in an adjacent specialty who actually prescribes (e.g. a neurologist on Ozempic) is recommended, while a zero-TRx result is affirmative evidence of non-relevance.

Specialty-relevance gate: a confirmed HCP is only should_call: true when their specialty is core to NNI's diabetes/obesity/cardiometabolic focus (endocrinology, primary care, cardiology, nephrology, obesity medicine) OR there is specific evidence of a cardiometabolic tie — prescribing data, or a documented sub-focus (e.g. diabetic neuropathy). HCPs in unrelated specialties return should_call: false with a potential_relevance note describing what would make them relevant. A current Novo Nordisk / NNI employee returns category: "Internal/Own-Company".

Configuration

Driven by env vars on the Netlify site:

LLM_PROVIDER   = "anthropic" | "openai"      // default: anthropic
LLM_API_KEY    = "..."                        // provider API key
LLM_MODEL      = "claude-haiku-4-5-20251001"  // default per provider
LLM_TIMEOUT_MS = "6000"                        // optional per-call timeout

Graceful degradation: if LLM_API_KEY is unset, or the call fails / times out / returns unparseable output, both fields come back null with a reason in their *_source field. The screening result is never blocked on the LLM. The call is also skipped (no spend) when there is no position and no web signal to reason about.

Identical prompts are de-duplicated by an ephemeral per-container memo (saves tokens during batch runs); this is process memory only, not persistence — the stateless guarantee holds.

Spell-correction retry (Case 3b)

The deepest sequential branch in the pipeline — tiers_run: ["A", "B", "A-retry"].

When Tier B's classifier emits a spell_corrected_last_name that differs from the input's lastName (case-insensitive), the function:

Builds correctedInput = { ...input, lastName: spell_corrected_last_name }
Calls tierA_npiFuzzy(correctedInput) — a fresh 5-pass NPPES cascade with the corrected name
If the retry's top candidate scores ≥ 0.65, returns the corrected match with source: "npi_fuzzy_spell_corrected" and a spell_corrected: { from, to } field
Otherwise falls through to Cases 1–6 with original input

This is what makes typo'd badge data recoverable. Input "Moscaritols" still returns the correct NPI record for Dr. Joyce Moscarito — with the corrected spelling labeled in the response so the consumer knows it was repaired.

Scoring & confidence

Every NPI candidate gets a score in [0, 1]. The base is 0.5 (caller already filtered by name); signals add or subtract.

Signal weights

Signal	When it fires	Δ score
`state_match`	Input state == NPI practice OR mailing state	+0.20
`state_mismatch`	Both input and NPI have states, and they differ	−0.25
`city_match`	Input city (lowercased) == NPI practice/mailing city	+0.10
`zip_match`	Input ZIP (5-digit) == NPI practice ZIP (5-digit)	+0.10
`specialty_match`	Input `specialty_hint` appears in NPI primary taxonomy description (either direction)	+0.15
`specialty_soft_mismatch`	Both have specialty info but no substring overlap	−0.03
`employer_substring`	Input employer substring-matches NPI practice `address_1`	+0.08

Final score is clamped to [0, 1].

Disposition thresholds

Top score	Runner-up gap	Disposition
≥ 0.85	≥ 0.15 ahead of #2	matched high confidence — Case 0 short-circuit
≥ 0.85	< 0.15 ahead of #2	Falls through to Tier B (ambiguous top hit)
0.65 – 0.84	any	matched medium confidence (Cases 1 / 4)
< 0.65 (but > 0)	any	inconclusive Case 5
0	—	unknown Case 6

Full pipeline

The complete sequence of branches from input to response.

CORS preflight (OPTIONS → 204) · Method gate (POST only) · Auth check (X-API-Key)

screen.js:22-36

Pre-flight

Parse JSON · validate firstName + lastName · extractCredentials · normalize

screen.js:39-68

Normalize

tierA_npiFuzzy(input) — 5-pass cascade against NPPES

screen.js:75 → npi_fuzzy.js

Tier A

✓

If aDisp.confidence ≥ 0.85 → return matched Case 0 immediately

screen.js:83-95

Case 0

tierB_webSearch(input) — 5 Serper queries

screen.js:103 → web_search.js

Tier B

classifyHits(tierB.hits, input) — name verification + geo alignment + degree/credential extraction

screen.js:105 → web_search.js

Tier B classifier

If classifier flagged spell_corrected_last_name ≠ input.lastName → tierA_npiFuzzy with corrected name

screen.js:110-129

Tier A retry

✓

If retry confidence ≥ 0.65 → return matched Case 3b with spell_corrected annotation

screen.js:115-128

Case 3b

Evaluate Cases 1 → 2 → 3 → 4 → 5 → 6 in order, return the first that matches

screen.js:140-246

Final disposition

Maximum external HTTP calls per screening

Source	Max calls	When
NPPES (Tier A #1)	~10	One per name variant — exact, nicknames, hyphen splits, no-state, initial, swap
Serper (Tier B)	5	4 quoted variants + 1 unquoted spell-tolerant
NPPES (Tier A retry)	~10	Only fires on Case 3b spell-correction
Total worst case	~25 outbound HTTPS calls	Per single screening

HTTP status codes

Code	When
200 + `ok:true`	Screening completed — any of `matched / inconclusive / not_hcp / unknown`
204	CORS preflight (OPTIONS)
400	Malformed JSON body, or `firstName` / `lastName` missing
401	Missing or wrong `X-API-Key` header
405	Any method other than POST or OPTIONS
502	Tier A error (NPPES upstream failure). Tier B errors do not propagate as 502 — they degrade the response to Tier-A-only with reasoning that notes Tier B was unavailable.

Stateless guarantee

No persistence anywhere. Each request is independent.

This is an explicit project decision dated 2026-05-18. No database, no call logs, no audit trail. The function code has no fs.writeFile, no Supabase client, no logging service. The only writes the function performs are the response body itself.

If/when analytics or compliance auditing is needed, that's a meaningful architectural decision that requires explicit sign-off — not a one-liner addition.

Try it live

Fire a real screening from this page. Uses the same API key your main app does (stored in this browser's localStorage).

First name *

Last name *

ZIP

State

City

Specialty hint

No data leaves your browser except the API call itself.