Overview
A stateless REST API that takes a single convention-booth attendee and tries to identify them as a true Healthcare Professional (HCP) — returning degree and specialty when possible.
The upstream pipeline has two earlier identification stages: Level 1 matches against a pharma customer database (CDMI), and Level 2 attempts a direct NPI Registry exact match. This screener is Level 3 — the fallback when both upstream tiers fail.
What "HCP" means here
Any licensed clinician identifiable via degree (MD / DO / DDS / DMD / DPM / DPT / PharmD / DNP / OD) or credential string (RN / NP / PA / CNM / CRNA / RPh, etc.). NPI is a strong corroborating signal but not a hard requirement — research nurses, educators, and pharmacists in non-billing roles can be HCPs without being enumerated in NPPES.
Endpoint
POST https://nni-ma-hcp-screener.netlify.app/api/screen
Also reachable directly at POST /.netlify/functions/screen (the /api/screen path is a Netlify redirect).
Quickstart
Identify a single attendee with one HTTP call.
curl -X POST "https://nni-ma-hcp-screener.netlify.app/api/screen" \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY_HERE" \
-d '{
"firstName": "Ilene",
"lastName": "Friedman",
"state": "NY",
"city": "Northport",
"postalCode": "11768",
"specialty_hint": "Internal Medicine"
}'
const res = await fetch('https://nni-ma-hcp-screener.netlify.app/api/screen', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-API-Key': 'YOUR_API_KEY_HERE'
},
body: JSON.stringify({
firstName: 'Ilene',
lastName: 'Friedman',
state: 'NY',
city: 'Northport',
postalCode: '11768',
specialty_hint: 'Internal Medicine'
})
});
const data = await res.json();
console.log(data.match.status, data.match.confidence, data.match.npi);
import requests
resp = requests.post(
"https://nni-ma-hcp-screener.netlify.app/api/screen",
headers={
"Content-Type": "application/json",
"X-API-Key": "YOUR_API_KEY_HERE",
},
json={
"firstName": "Ilene",
"lastName": "Friedman",
"state": "NY",
"city": "Northport",
"postalCode": "11768",
"specialty_hint": "Internal Medicine",
},
timeout=30,
)
data = resp.json()
print(data["match"]["status"], data["match"]["confidence"], data["match"].get("npi"))
A successful response (HTTP 200) wraps the result in a small envelope:
{
"ok": true,
"match": { "status": "matched", "confidence": 0.95, ... },
"alternates": [ ... ],
"tiers_run": ["A"],
"input_echo": { "firstName": "Ilene", "lastName": "Friedman", "state": "NY", "city": "Northport" }
}
Authentication
Every request must carry an X-API-Key header.
X-API-Key: 449ba0...e74650ebad948
Without the header (or with a wrong value) the function returns HTTP 401 with { "ok": false, "error": "Invalid or missing X-API-Key header." }.
The key is stored as the SCREENER_API_KEY env var on the Netlify site. To rotate: generate a new value, PUT it to the same env var via the Netlify API, redeploy, notify upstream.
Request schema
JSON body, application/json. Only first + last name are strictly required. The richer the input, the better the disambiguation.
| Field | Required? | Type / format | Purpose & normalization |
|---|---|---|---|
| firstName | Required | string | Lowercased + whitespace-collapsed. Trailing credentials are auto-stripped ("Anthony MD" → "Anthony", credentials captured separately). |
| lastName | Required | string | Same normalization. Hyphens preserved; splits into variants in Tier A ("Smith-Jones" → ["Smith-Jones", "Smith Jones", "Smith", "Jones"]). |
middleName (or middleInitial) | Optional | string | Disambiguation aid; passed to the LLM enrichment context. (The web console makes this a required field — enter "none" if none.) |
organization (or employer) | Recommended | string | Strongest disambiguator for non-HCPs. Drives LinkedIn- and employer-site-led Tier B queries and the LLM enrichment. Also substring-matched against NPI practice-address line 1 (+0.08 on hit). The web console requires it (enter "none" if unknown). |
| suffix | Optional | string | Jr / Sr / III / etc. Currently echoed in the input. |
| Optional | string | Lowercased. Domain used as a corroborating employer signal. | |
| city | Optional | string | Used as a geo signal. Adds +0.10 to Tier A score on match. Also used by Tier B classifier to disambiguate namesakes in different cities. |
| state | Optional | 2-letter code | Uppercased + trimmed. Filters initial Tier A passes. Adds +0.20 on match, subtracts −0.25 on mismatch. Tier B classifier uses it for geo alignment. |
postalCode (or zip) | Optional | 5-digit string | Non-digits stripped, sliced to first 5. Adds +0.10 on exact match against NPI practice ZIP. |
specialty_hint (or specialtyHint) | Optional | string | Substring-matched (either direction) against NPI primary taxonomy description. Adds +0.15 on match, −0.03 on soft mismatch. |
| convention | Optional | string | Free-form metadata. Currently not used in scoring; preserved for future analytics. |
Minimum vs. recommended input
Minimum: { "firstName": "...", "lastName": "..." } — works but low-confidence (no geo signals).
Recommended: First + Last + postalCode + state + city + organization (+ specialty_hint when known). For common names — and for non-HCPs at small specialized meetings with no badge data — organization is the single most valuable disambiguator.
Response schema
Every successful response shares a common envelope. Status-specific fields nest under match.
{
"ok": true,
"match": {
"status": "matched" | "inconclusive" | "not_hcp" | "unknown",
"confidence": 0.0 - 1.0,
"source": "npi_direct" | "npi_fuzzy" | "npi_fuzzy_spell_corrected" | "tier_b_web_search",
"tier": "A" | "B",
// ... status-specific fields below ...
"match_reasoning": "human-readable string explaining why this disposition"
},
"alternates": [ /* up to 4 runner-up matches */ ],
"tiers_run": ["A"] | ["A", "B"] | ["A", "B", "A-retry"],
"input_echo": { "firstName": "...", "lastName": "...", "state": "...", "city": "..." }
}
Source values
| Source | Meaning | Trust |
|---|---|---|
npi_direct | NPI Registry returned this person on a literal first+last variant. No fuzzy expansion needed. | Highest |
npi_fuzzy | NPI Registry match via a name variant (nickname, hyphen split, wildcard). Still NPI-authoritative. | High |
npi_fuzzy_spell_corrected | Input had a typo; Tier B suggested a correction; Tier A retry with the corrected spelling hit. Returns the corrected NPI record. | High |
tier_b_web_search | No reliable NPI hit; classification came from web search results. Used for non-NPI HCPs (research nurses, etc.) and for not_hcp dispositions. | Lowest — adjudicate |
Match-object field reference
| Field | Type | Where it comes from | Notes |
|---|---|---|---|
status | string | Disposition logic | matched / inconclusive / not_hcp / unknown |
confidence | 0.0–1.0 | Scoring | ≥0.85 high · 0.65–0.84 medium · <0.65 inconclusive |
source | string | Pipeline branch | npi_direct / npi_fuzzy / npi_fuzzy_spell_corrected / tier_b_web_search |
tier | "A" \| "B" | Pipeline branch | Which tier produced the match |
npi | string \| null | NPPES | 10-digit NPI; null for Case-3 non-NPI HCPs |
name_prefix | string \| null | NPPES basic.name_prefix | e.g. "DR." |
firstName / middleName / lastName / name_suffix | strings | NPPES | Raw NPPES casing |
credential | string \| null | NPPES | e.g. "M.D.", "RN", "PA-C" |
gender | "M" \| "F" \| null | NPPES | Self-reported |
job_title | string \| null | Composite — credential + taxonomy | e.g. "Internal Medicine, M.D.". Synthesized; not a raw NPPES field. |
job_description | string \| null | NPPES taxonomy desc | Same value as primary_taxonomy.name, surfaced as a friendly alias. |
primary_taxonomy | object | NPPES | { code, name, license, license_state } |
secondary_taxonomies | array | NPPES | Multi-specialty providers' additional taxonomies |
practice_address | object | NPPES address with purpose=LOCATION | { street, street_2, city, state, zip, phone, fax, formatted } |
mailing_address | object \| null | NPPES address with purpose=MAILING | Same shape as practice_address; null if same as practice |
additional_practice_locations | array | NPPES practiceLocations | Multi-site practitioners |
place_of_employment | object \| null | Tier B inference | { name, source, evidence_url } — extracted from web snippets matching "at <Org>" / LinkedIn patterns. source: "tier_b_web_inference" always — never authoritative. |
job_title_inferred | object \| null | Tier B inference | Same shape; only set when LinkedIn-style "Title · Org" pattern fires. |
enumerated_since | YYYY-MM-DD | NPPES | When the NPI was issued |
last_updated | YYYY-MM-DD | NPPES | When the NPPES record was last touched |
match_signals | array of strings | Scoring | Which Δ-score signals fired |
match_reasoning | string | Function | Human-readable explanation |
flags | array | Function | e.g. ["NOT_IN_NPI_REGISTRY"], ["ROLE_NON_CLINICAL"] |
spell_corrected | object \| absent | Case 3b only | { from, to } |
tier_b_evidence | array | Tier B classifier | Up to 3 supporting web hits (Cases 1, 2, 3) |
| Enrichment fields (v0.3.x) — derived after Tier A + Tier B by the LLM layer | |||
is_hcp | boolean | NPPES / LLM | HCP facet. true if the person holds a real clinical credential or practices clinically (definitive on an NPI/credential match), regardless of their current job title. Independent of is_executive. |
is_executive | boolean | LLM | Executive facet. true for a leadership / administrative / business role (CEO, VP, Director, Dean, Chief Nursing Officer, …). Independent of is_hcp — a credentialed executive (e.g. a Chief Nursing Officer who is an RN/DNP) is both true. |
current_employment + current_employment_source | string \| null | Tier B regex / LLM | Current employer. Prefers the NPPES/Tier-B inferred place_of_employment.name (_source: "tier_b_web_inference"); for non-HCPs with no such match, falls back to the LLM-extracted employer ("llm_generated:<model>") — e.g. a CEO's company. Null when no employer is evident. |
current_position + current_position_source | string \| null | LLM (employer portal) / NPPES | Current job title/role. The LLM sources it from the employer's own official website / leadership page first (then curated directories / LinkedIn, then press), using one clean canonical title and never merging stale or multi-source titles — e.g. "Group Senior Vice President, Chief Nursing Executive". A bare NPPES credential (e.g. "DNP") is never used as a role. |
highest_education + highest_education_source | string \| null | NPPES credential / LLM | Highest level of education / terminal degree, e.g. "Doctor of Medicine (M.D.)". Deterministic from a clear NPPES credential (_source: "nppes_credential"); otherwise LLM-inferred from web evidence ("llm_generated:<model>") — e.g. a "Professor of Medicine" → "Doctorate (M.D. and/or Ph.D.)". |
position_function + position_function_source | string \| null | LLM | 1–2 sentence plain-language explanation of what someone in this position/specialty typically does, for a non-clinical reader. _source: "llm_generated:<model>". Null when no position was identified or the LLM layer is unavailable. |
call_recommendation + call_recommendation_source | object \| null | LLM | { should_call, category, rationale, confidence } — whether NNI medical affairs should call on this person even if they are not an HCP — e.g. a professor of medicine, dean / department chair, KOL / thought leader, podcaster or online influencer, P&T / formulary committee member, professional-association member, patient-advocacy NGO / non-profit member, researcher, or payer. See the category enum below. _source carries provenance or an unavailable: <reason> note. |
place_of_employment, current_employment, and job_title_inferred objects come from Tier B web search and are best-effort. Treat them as hints, not authoritative facts. The job_title / job_description / current_position fields are synthesized from NPPES credential + taxonomy and are deterministic for any NPI match.position_function and call_recommendation are LLM-generated (added v0.3.0) from a single live model call per request. They are advisory judgments, not facts — call_recommendation in particular is a triage hint for whom to approach, not a vetted determination. The screener degrades gracefully: if the LLM layer is unconfigured or fails, both fields return null with a reason in *_source and the rest of the result is unaffected.
call_recommendation.category enum: HCP, KOL/Influencer, P&T/Formulary Committee, Patient Advocacy/NGO, Professional Association, Researcher/Academic, Payer/Managed Care, Internal/Own-Company, Industry/Other, Not Recommended, Unknown.
Status values
matched— a single identity meets the confidence threshold. Cases 0, 1, 3, 3b, and 4 produce this.inconclusive— a top candidate exists but its score is below the 0.65 match threshold.alternatesare returned for human review.not_hcp— Tier B found clear non-clinical role signals (Sales, Marketing, Realtor, etc.). NPI hits, if any, appear to be different individuals.unknown— nothing found in either tier.raw_hitsfrom Tier B (if any) are returned for adjudication.
The 6 disposition cases
The function evaluates branches in order. The first that matches wins.
matched Case 0 — Tier A high-confidence
NPI Registry returns a candidate with score ≥ 0.85 AND ahead of the runner-up by at least 0.15. Tier B still runs in parallel to populate place_of_employment from web evidence — that's why tiers_run reports ["A", "B"] even on high-confidence Tier A matches.
{
"ok": true,
"match": {
"status": "matched",
"confidence": 0.95,
"source": "npi_direct",
"tier": "A",
"npi": "1992894257",
"name_prefix":"DR.",
"firstName": "ILENE", "middleName": "LAUREN", "lastName": "FRIEDMAN",
"credential": "M.D.",
"gender": "F",
"job_title": "Internal Medicine, M.D.",
"job_description": "Internal Medicine",
"primary_taxonomy": {
"code": "207R00000X",
"name": "Internal Medicine",
"license": "198905",
"license_state":"NY"
},
"secondary_taxonomies": [],
"practice_address": {
"street": "79 MIDDLEVILLE RD",
"street_2": null,
"city": "NORTHPORT",
"state": "NY",
"zip": "11768",
"phone": "631-261-0050",
"fax": "631-754-3017",
"formatted": "79 MIDDLEVILLE RD, NORTHPORT, NY 11768"
},
"mailing_address": null,
"additional_practice_locations": [],
"place_of_employment": { // present only when Tier B inferred one
"name": "Huntington Hospital",
"source": "tier_b_web_inference",
"evidence_url": "https://www.huntingtonhospital.org/..."
},
"job_title_inferred": null, // populated when LinkedIn-style "Title - Org" pattern hits
"enumerated_since": "2006-10-12",
"last_updated": "2024-08-15",
"match_signals": ["state_match", "city_match", "zip_match", "specialty_match"],
"match_reasoning": "Tier A match via variant \"exact:Ilene Friedman/NY\". Signals: state_match, city_match, zip_match, specialty_match."
},
"alternates": [],
"tiers_run": ["A", "B"]
}
matched Case 1 — Tier A medium + Tier B corroborates
NPI score is 0.65 ≤ x < 0.85, and the Tier B classifier confirms HCP signals (degree mentions like "MD" in name-verified, geo-aligned snippets).
{
"ok": true,
"match": {
"status": "matched",
"confidence": 0.75,
"source": "npi_direct" | "npi_fuzzy",
"tier": "A",
/* full NPI record fields */
"tier_b_evidence": [ /* up to 3 corroborating web hits */ ],
"match_reasoning": "Tier A direct match (0.75); Tier B corroborates HCP via web hits (degree=MD)."
},
"alternates": [ ... ],
"tiers_run": ["A", "B"]
}
not_hcp Case 2 — Tier B finds non-clinical role
Tier B's classifier hits at least one non-clinical role signal (Sales, Marketing, Realtor, VP, Attorney, etc.) and the NPI candidates appear to be different individuals (geographic or context mismatch).
{
"ok": true,
"match": {
"status": "not_hcp",
"confidence": 0.6,
"source": "tier_b_web_search",
"tier": "B",
"tier_a_hits": 1,
"flags": ["ROLE_NON_CLINICAL"],
"match_reasoning": "Tier B found 1 non-clinical role signal(s) in web hits. Tier A NPI hits (1) appear to be different individuals; geographic or context mismatch."
},
"alternates": [],
"tiers_run": ["A", "B"]
}
Confidence formula: min(0.85, 0.5 + 0.1 × negative_signals).
matched Case 3 — Tier B finds HCP, no NPI
The 2026-05-18 broadened scope. Tier A couldn't place the person in NPPES, but Tier B web hits show a clear clinical degree/credential signal in geo-aligned, name-verified snippets.
Common for non-billing clinical roles: research nurses, educators, formulary pharmacists, administrators.
{
"ok": true,
"match": {
"status": "matched",
"confidence": 0.78,
"source": "tier_b_web_search",
"tier": "B",
"npi": null,
"firstName": "...", "lastName": "...",
"credential": "RN" | "PA" | "MD" | "...",
"degree": "RN" | "PA" | "MD" | "...",
"tier_b_evidence": [ ... ],
"flags": ["NOT_IN_NPI_REGISTRY"],
"match_reasoning": "Tier A could not place this name confidently in NPI Registry (top score 0.40). Tier B web hits show clear clinical degree/credential signal — likely an HCP not enumerated in NPI."
},
"alternates": [],
"tiers_run": ["A", "B"]
}
matched Case 3b — Spell-correction retry
Tier B's classifier detects that the last-name token most consistent with the corroborating hits is a Levenshtein-distance-1+ correction of the input. The function then re-runs Tier A with the corrected spelling — and if that returns a confident NPI hit, we surface the corrected record.
This is the deepest pipeline branch: tiers_run: ["A", "B", "A-retry"].
{
"ok": true,
"match": {
"status": "matched",
"confidence": 0.78,
"source": "npi_fuzzy_spell_corrected",
"tier": "A",
/* full NPI record from the corrected query */
"spell_corrected": { "from": "Moscaritols", "to": "Moscarito" },
"match_reasoning": "Input lastName \"Moscaritols\" appears to be a typo. Tier B web search surfaced \"Moscarito\" as a likely correction; Tier A re-run with the corrected spelling produced a confident NPI match."
},
"tiers_run": ["A", "B", "A-retry"]
}
matched Case 4 — Tier A medium alone
Same shape as Case 1, but Tier B either ran without corroborating or wasn't available (no provider configured / out of credits). The reasoning explicitly says so.
{
"match": {
"status": "matched",
"confidence": 0.65,
"source": "npi_direct",
"tier": "A",
/* full NPI record */
"match_reasoning": "Tier A direct match at medium confidence (0.65). Tier B did not corroborate."
}
}
inconclusive Case 5 — Below match threshold
A top candidate exists but its score is < 0.65. alternates are returned for adjudication; raw_hits from Tier B (up to 5) too.
{
"match": {
"status": "inconclusive",
"confidence": 0.45,
"tier": "A",
"match_reasoning": "Tier A produced a candidate at score 0.45 (below the 0.65 match threshold). Tier B did not provide a clear signal."
},
"alternates": [ ... ],
"raw_hits": [ ... ]
}
unknown Case 6 — Nothing found
Tier A produced zero candidates after all 5 passes, and Tier B (if available) returned no name-verified credential signals.
{
"match": {
"status": "unknown",
"confidence": 0,
"tiers_tried": ["A", "B"],
"variants_tried": [ "exact:Foo Bar/NY", "variant:Foo Baz/NY", ... ],
"match_reasoning": "No matches in Tier A (NPI Registry, 7 variants tried). Tier B returned 0 hits, none with clear HCP credential signal."
},
"alternates": [],
"raw_hits": []
}
Tier A — NPI Registry cascade
A 5-pass progressive expansion against NPPES (the public CMS API at npiregistry.cms.hhs.gov). Free, no auth.
Name normalization (pre-cascade)
- Diacritic strip —
José→Jose,Müller→Mullervia NFD + combining-mark removal. - Whitespace collapse + trim — runs of spaces become single spaces; leading/trailing punctuation stripped.
- Suffix & credential stripping — trailing tokens matching the suffix/credential vocab are removed before lookup:
- Suffixes:
jr, sr, ii, iii, iv, v - Degrees:
md, do, dds, dmd, dpm, dpt, pharmd, dnp, phd, dvm, od - Credentials:
rn, aprn, crna, cnm, np, pa-c, pa, rph, rrt, rdn, lcsw, msw - Academic:
mba, msc, ms, ma, ba, bsn, msn, dnsc, edd, jd - Fellowships:
facs, facp, fasco, faan, faonl, nea-bc, ne-bc, ahn-bc, ccrn, cphq
M.D.) are normalized first by stripping periods. - Suffixes:
Nickname expansion
A bidirectional table of ~110 canonical names with their common nicknames. The lookup walks both directions — input "Bob" expands to {Bob, Robert, Bobby, Robbie, Bert}; input "Robert" expands to the same set. Both male and female forms covered.
Sample of the nickname table (click to expand)
| Canonical | Variants |
|---|---|
| Robert | Rob, Bob, Bobby, Robbie, Bert |
| William | Will, Bill, Billy, Willie, Liam |
| Richard | Rick, Dick, Rich, Richie, Rickey |
| Michael | Mike, Mikey, Mick, Mickey |
| James | Jim, Jimmy, Jamie, Jay |
| John | Jack, Johnny, Jon |
| Anthony | Tony, Ant |
| Elizabeth | Liz, Beth, Betty, Eliza, Lizzie, Ellie, Libby, Betsy, Bess |
| Margaret | Maggie, Meg, Peggy, Margie, Madge, Greta |
| Catherine | Cathy, Kate, Katie, Kit, Cat |
| Reynaldo | Rey, Ronnie, Naldo |
Full table at src/lib/name_variants.js lines 6–149.
Last-name variants
Hyphenated last names fan out:
"Smith-Jones" → ["Smith-Jones", "Smith Jones", "Smith", "Jones"]
Each variant is tried as a separate NPPES query (NPPES wildcard-suffixes every query automatically — see "wildcard trick" below).
NPPES wildcard trick
first_name and last_name. That misses compound first names ("SUSAN ANABELLE" stored as one field). The function appends a trailing * to both fields on every query, which catches compounds without breaking exact matches.The 5-pass cascade
first=<Last>, last=<First>.Tier B — Web search + classifier
Provider-agnostic search layer (Brave / Serper / Google CSE). Used to corroborate Tier A medium-confidence matches and to identify HCPs that aren't in NPPES.
Provider selection
Driven by two env vars on the Netlify site:
WEB_SEARCH_PROVIDER = "serper" | "brave" | "google_cse"
WEB_SEARCH_API_KEY = "..."
GOOGLE_CSE_ID = "..." // only for google_cse
If neither is set, Tier B returns { available: false, reason: "..." } and the function falls through to a Tier-A-only response.
Currently configured: Serper.dev (paid plan, 50k queries/month).
The 5 queries Tier B fires per attendee
- Tightest:
"<First Last>" "<City>" <ST>— name + quoted city + state. Catches local non-clinical context too (deliberately broad to detect "not_hcp"). - Name + city + clinical-role terms:
"<First Last>" "<City>" <ST> doctor physician nurse pharmacist - Name + employer + specialty:
"<First Last>" <employer> <specialty>(only if those fields were provided) - LinkedIn fallback:
"<First Last>" "<City>" linkedin - Typo tolerance:
<First Last> <ST> doctor physician nurse pharmacist— unquoted, lets Google auto-spell-correct.
Queries are deduplicated and any whose body is essentially just the name are dropped. The provider returns up to 5 hits per query.
Classifier — classifyHits()
Up to the first 15 aggregated hits are inspected. For each, the title + snippet are scanned for three regex patterns:
Positive degree pattern
/\b(MD|D\.O\.|DO|DDS|DMD|DPM|DPT|PharmD|DNP|PhD|DVM|OD)\b/
Positive credential pattern
/\b(RN|APRN|CRNA|CNM|NP|PA-C|PA|RPh|RRT|RDN|NEA-BC|FAAN|FACS|FACP)\b/
Negative-role pattern
/\b(VP|Vice President|CEO|CFO|COO|Founder|Investor|Sales|Marketing|
Realtor|Real Estate|Attorney|Lawyer|Tenant|Tenants Association|
Activist|Politician|Councilman|Mayor)\b/i
Name verification — Levenshtein, both names required
Every positive hit must have both the input firstName and lastName fuzzy-matched to tokens in the snippet via Levenshtein distance. This was added 2026-05-18 after two false-positive incidents:
- "Kevin Weller / Jersey City NJ" — was matching a snippet about "Kevin John Weller, APRN" in New Hampshire (different state).
- "Joyce Moscaritols" (typo) — was matching "Dr. Michael Joyce, MD" because the input firstName "Joyce" appeared as someone else's last name.
Distance threshold per token: min(3, max(1, ceil(needle.length × 0.30))). So a 6-letter name allows up to 2 edits; a 10-letter name allows up to 3.
Geographic alignment
Each name-verified credential snippet is tagged:
aligned— the input state code OR full state name appears in the snippet, OR the input city appears in the snippetmismatched— state codes appear in the snippet, but none match the input stateunknown— no state info detected
State-code matching uses comma/space context ("City, ST" patterns) to avoid false positives like "MS Excel."
Decision logic
if (neg > 0 && name_matched_aligned === 0) → isHCP: false
else if (name_matched_aligned ≥ 1) → isHCP: true
else if (name_matched_count ≥ 2 && no mismatch
&& no negatives) → isHCP: true
else → isHCP: false
The classifier also returns a spell_corrected_last_name field whenever the name-verified last-name match has distance > 0 — i.e., the snippet's lastName isn't identical to the input's. This is what triggers the Case 3b retry.
LLM enrichment layer (added v0.3.0)
After Tier A + Tier B settle on a disposition, a single live LLM call enriches the result with two judgment fields: position_function and call_recommendation. Runs on every disposition — including not_hcp and unknown — because the call recommendation is most useful for non-HCPs.
What it produces
highest_education— highest level of education / terminal degree. Deterministic from a clear NPPES credential (M.D., PharmD, DNP, Ph.D., …); otherwise inferred from web evidence (e.g. "Dean of the School of Medicine" → a doctorate).nullonly when there's no signal.position_function— a 1–2 sentence plain-language explanation of what the identified position/specialty/credential typically does (core role & responsibility), for a non-clinical reader.nullwhen no position was identified.call_recommendation—{ should_call, category, rationale, confidence }. Confirmed HCPs returnshould_call: true, category: "HCP". For non-HCPs, the model judges from the web evidence whether the person is still a valuable medical-affairs contact in the diabetes/obesity/cardiometabolic space — a professor of medicine, dean / department chair, KOL / thought leader, podcaster / online influencer, P&T or formulary committee member, professional-association member, patient-advocacy NGO/non-profit member, researcher/academic, or payer/managed-care. A current Novo Nordisk / NNI employee returnscategory: "Internal/Own-Company"(own-company colleague, not an outreach target). Clearly unrelated roles or namesakes returnshould_call: false, category: "Not Recommended".social_profiles— an array of the person's reliably-attributed digital profiles:[{ platform, url, handle, followers, reliable, basis }]. Only profiles corroborated to this person (by name + employer/location/specialty) are included; likely namesakes are excluded.followersis populated only when a snippet states it.dol— Digital Opinion Leader assessment:{ is_dol, tier, topics, rationale, confidence }wheretier∈none | emerging | established | leading. Grades digital influence in the diabetes/obesity/cardiometabolic space from the reliable profiles + evidence (audience size, posting cadence, podcast/video hosting, conference presence). A plain profile with no influence signal staystier: "none".identity_ids— resolved cross-reference ids{ cdm_id, ims_id, npi_id }from the OSP identity service (looked up by last name + first name + ZIP). Any field may benull. Thenpi_idhere independently corroborates the Tier-A NPI match.prescribing_data— IQVIA-style prescribing for NNI cardiometabolic products, fetched byims_id:{ source, ims_id, total_trx, total_nbrx, prescribed[], products[] }. Any cardiometabolic prescribing (total_trx > 0) is treated by the recommendation gate as a direct in-scope tie — so a clinician in an adjacent specialty who actually prescribes (e.g. a neurologist on Ozempic) is recommended, while a zero-TRx result is affirmative evidence of non-relevance.
Specialty-relevance gate: a confirmed HCP is only should_call: true when their specialty is core to NNI's diabetes/obesity/cardiometabolic focus (endocrinology, primary care, cardiology, nephrology, obesity medicine) OR there is specific evidence of a cardiometabolic tie — prescribing data, or a documented sub-focus (e.g. diabetic neuropathy). HCPs in unrelated specialties return should_call: false with a potential_relevance note describing what would make them relevant. A current Novo Nordisk / NNI employee returns category: "Internal/Own-Company".
Configuration
Driven by env vars on the Netlify site:
LLM_PROVIDER = "anthropic" | "openai" // default: anthropic
LLM_API_KEY = "..." // provider API key
LLM_MODEL = "claude-haiku-4-5-20251001" // default per provider
LLM_TIMEOUT_MS = "6000" // optional per-call timeout
Graceful degradation: if LLM_API_KEY is unset, or the call fails / times out / returns unparseable output, both fields come back null with a reason in their *_source field. The screening result is never blocked on the LLM. The call is also skipped (no spend) when there is no position and no web signal to reason about.
Identical prompts are de-duplicated by an ephemeral per-container memo (saves tokens during batch runs); this is process memory only, not persistence — the stateless guarantee holds.
Spell-correction retry (Case 3b)
The deepest sequential branch in the pipeline — tiers_run: ["A", "B", "A-retry"].
When Tier B's classifier emits a spell_corrected_last_name that differs from the input's lastName (case-insensitive), the function:
- Builds
correctedInput = { ...input, lastName: spell_corrected_last_name } - Calls
tierA_npiFuzzy(correctedInput)— a fresh 5-pass NPPES cascade with the corrected name - If the retry's top candidate scores ≥ 0.65, returns the corrected match with
source: "npi_fuzzy_spell_corrected"and aspell_corrected: { from, to }field - Otherwise falls through to Cases 1–6 with original input
"Moscaritols" still returns the correct NPI record for Dr. Joyce Moscarito — with the corrected spelling labeled in the response so the consumer knows it was repaired.Scoring & confidence
Every NPI candidate gets a score in [0, 1]. The base is 0.5 (caller already filtered by name); signals add or subtract.
Signal weights
| Signal | When it fires | Δ score |
|---|---|---|
state_match | Input state == NPI practice OR mailing state | +0.20 |
state_mismatch | Both input and NPI have states, and they differ | −0.25 |
city_match | Input city (lowercased) == NPI practice/mailing city | +0.10 |
zip_match | Input ZIP (5-digit) == NPI practice ZIP (5-digit) | +0.10 |
specialty_match | Input specialty_hint appears in NPI primary taxonomy description (either direction) | +0.15 |
specialty_soft_mismatch | Both have specialty info but no substring overlap | −0.03 |
employer_substring | Input employer substring-matches NPI practice address_1 | +0.08 |
Final score is clamped to [0, 1].
Disposition thresholds
| Top score | Runner-up gap | Disposition |
|---|---|---|
| ≥ 0.85 | ≥ 0.15 ahead of #2 | matched high confidence — Case 0 short-circuit |
| ≥ 0.85 | < 0.15 ahead of #2 | Falls through to Tier B (ambiguous top hit) |
| 0.65 – 0.84 | any | matched medium confidence (Cases 1 / 4) |
| < 0.65 (but > 0) | any | inconclusive Case 5 |
| 0 | — | unknown Case 6 |
Full pipeline
The complete sequence of branches from input to response.
aDisp.confidence ≥ 0.85 → return matched Case 0 immediatelyspell_corrected_last_name ≠ input.lastName → tierA_npiFuzzy with corrected namespell_corrected annotationMaximum external HTTP calls per screening
| Source | Max calls | When |
|---|---|---|
| NPPES (Tier A #1) | ~10 | One per name variant — exact, nicknames, hyphen splits, no-state, initial, swap |
| Serper (Tier B) | 5 | 4 quoted variants + 1 unquoted spell-tolerant |
| NPPES (Tier A retry) | ~10 | Only fires on Case 3b spell-correction |
| Total worst case | ~25 outbound HTTPS calls | Per single screening |
HTTP status codes
| Code | When |
|---|---|
200 + ok:true | Screening completed — any of matched / inconclusive / not_hcp / unknown |
| 204 | CORS preflight (OPTIONS) |
| 400 | Malformed JSON body, or firstName / lastName missing |
| 401 | Missing or wrong X-API-Key header |
| 405 | Any method other than POST or OPTIONS |
| 502 | Tier A error (NPPES upstream failure). Tier B errors do not propagate as 502 — they degrade the response to Tier-A-only with reasoning that notes Tier B was unavailable. |
Stateless guarantee
No persistence anywhere. Each request is independent.
This is an explicit project decision dated 2026-05-18. No database, no call logs, no audit trail. The function code has no fs.writeFile, no Supabase client, no logging service. The only writes the function performs are the response body itself.
Try it live
Fire a real screening from this page. Uses the same API key your main app does (stored in this browser's localStorage).