NNI MA HCP Screener Docs
← Back to app

Overview

A stateless REST API that takes a single convention-booth attendee and tries to identify them as a true Healthcare Professional (HCP) — returning degree and specialty when possible.

The upstream pipeline has two earlier identification stages: Level 1 matches against a pharma customer database (CDMI), and Level 2 attempts a direct NPI Registry exact match. This screener is Level 3 — the fallback when both upstream tiers fail.

What "HCP" means here

Any licensed clinician identifiable via degree (MD / DO / DDS / DMD / DPM / DPT / PharmD / DNP / OD) or credential string (RN / NP / PA / CNM / CRNA / RPh, etc.). NPI is a strong corroborating signal but not a hard requirement — research nurses, educators, and pharmacists in non-billing roles can be HCPs without being enumerated in NPPES.

Endpoint

POST https://nni-ma-hcp-screener.netlify.app/api/screen

Also reachable directly at POST /.netlify/functions/screen (the /api/screen path is a Netlify redirect).

Stateless by design. The screener stores nothing. No call log, no cache, no audit trail. Each request is independent.

Quickstart

Identify a single attendee with one HTTP call.

curl -X POST "https://nni-ma-hcp-screener.netlify.app/api/screen" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY_HERE" \
  -d '{
        "firstName":      "Ilene",
        "lastName":       "Friedman",
        "state":          "NY",
        "city":           "Northport",
        "postalCode":     "11768",
        "specialty_hint": "Internal Medicine"
      }'
const res = await fetch('https://nni-ma-hcp-screener.netlify.app/api/screen', {
  method:  'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key':    'YOUR_API_KEY_HERE'
  },
  body: JSON.stringify({
    firstName:      'Ilene',
    lastName:       'Friedman',
    state:          'NY',
    city:           'Northport',
    postalCode:     '11768',
    specialty_hint: 'Internal Medicine'
  })
});
const data = await res.json();
console.log(data.match.status, data.match.confidence, data.match.npi);
import requests

resp = requests.post(
  "https://nni-ma-hcp-screener.netlify.app/api/screen",
  headers={
    "Content-Type": "application/json",
    "X-API-Key":    "YOUR_API_KEY_HERE",
  },
  json={
    "firstName":      "Ilene",
    "lastName":       "Friedman",
    "state":          "NY",
    "city":           "Northport",
    "postalCode":     "11768",
    "specialty_hint": "Internal Medicine",
  },
  timeout=30,
)
data = resp.json()
print(data["match"]["status"], data["match"]["confidence"], data["match"].get("npi"))

A successful response (HTTP 200) wraps the result in a small envelope:

{
  "ok": true,
  "match":      { "status": "matched", "confidence": 0.95, ... },
  "alternates": [ ... ],
  "tiers_run":  ["A"],
  "input_echo": { "firstName": "Ilene", "lastName": "Friedman", "state": "NY", "city": "Northport" }
}

Authentication

Every request must carry an X-API-Key header.

X-API-Key: 449ba0...e74650ebad948

Without the header (or with a wrong value) the function returns HTTP 401 with { "ok": false, "error": "Invalid or missing X-API-Key header." }.

The key is stored as the SCREENER_API_KEY env var on the Netlify site. To rotate: generate a new value, PUT it to the same env var via the Netlify API, redeploy, notify upstream.

This API key gates Serper credit consumption. Treat it as a budget control, not just an authentication token. Every successful call past Tier A spends Serper credits.

Request schema

JSON body, application/json. Only first + last name are strictly required. The richer the input, the better the disambiguation.

FieldRequired?Type / formatPurpose & normalization
firstNameRequiredstringLowercased + whitespace-collapsed. Trailing credentials are auto-stripped ("Anthony MD""Anthony", credentials captured separately).
lastNameRequiredstringSame normalization. Hyphens preserved; splits into variants in Tier A ("Smith-Jones"["Smith-Jones", "Smith Jones", "Smith", "Jones"]).
middleName (or middleInitial)OptionalstringDisambiguation aid; passed to the LLM enrichment context. (The web console makes this a required field — enter "none" if none.)
organization (or employer)RecommendedstringStrongest disambiguator for non-HCPs. Drives LinkedIn- and employer-site-led Tier B queries and the LLM enrichment. Also substring-matched against NPI practice-address line 1 (+0.08 on hit). The web console requires it (enter "none" if unknown).
suffixOptionalstringJr / Sr / III / etc. Currently echoed in the input.
emailOptionalstringLowercased. Domain used as a corroborating employer signal.
cityOptionalstringUsed as a geo signal. Adds +0.10 to Tier A score on match. Also used by Tier B classifier to disambiguate namesakes in different cities.
stateOptional2-letter codeUppercased + trimmed. Filters initial Tier A passes. Adds +0.20 on match, subtracts −0.25 on mismatch. Tier B classifier uses it for geo alignment.
postalCode (or zip)Optional5-digit stringNon-digits stripped, sliced to first 5. Adds +0.10 on exact match against NPI practice ZIP.
specialty_hint (or specialtyHint)OptionalstringSubstring-matched (either direction) against NPI primary taxonomy description. Adds +0.15 on match, −0.03 on soft mismatch.
conventionOptionalstringFree-form metadata. Currently not used in scoring; preserved for future analytics.

Minimum vs. recommended input

Minimum: { "firstName": "...", "lastName": "..." } — works but low-confidence (no geo signals).

Recommended: First + Last + postalCode + state + city + organization (+ specialty_hint when known). For common names — and for non-HCPs at small specialized meetings with no badge data — organization is the single most valuable disambiguator.

Response schema

Every successful response shares a common envelope. Status-specific fields nest under match.

{
  "ok": true,
  "match": {
    "status":     "matched" | "inconclusive" | "not_hcp" | "unknown",
    "confidence": 0.0 - 1.0,
    "source":     "npi_direct" | "npi_fuzzy" | "npi_fuzzy_spell_corrected" | "tier_b_web_search",
    "tier":       "A" | "B",
    // ... status-specific fields below ...
    "match_reasoning": "human-readable string explaining why this disposition"
  },
  "alternates": [ /* up to 4 runner-up matches */ ],
  "tiers_run":  ["A"] | ["A", "B"] | ["A", "B", "A-retry"],
  "input_echo": { "firstName": "...", "lastName": "...", "state": "...", "city": "..." }
}

Source values

SourceMeaningTrust
npi_directNPI Registry returned this person on a literal first+last variant. No fuzzy expansion needed.Highest
npi_fuzzyNPI Registry match via a name variant (nickname, hyphen split, wildcard). Still NPI-authoritative.High
npi_fuzzy_spell_correctedInput had a typo; Tier B suggested a correction; Tier A retry with the corrected spelling hit. Returns the corrected NPI record.High
tier_b_web_searchNo reliable NPI hit; classification came from web search results. Used for non-NPI HCPs (research nurses, etc.) and for not_hcp dispositions.Lowest — adjudicate

Match-object field reference

FieldTypeWhere it comes fromNotes
statusstringDisposition logicmatched / inconclusive / not_hcp / unknown
confidence0.0–1.0Scoring≥0.85 high · 0.65–0.84 medium · <0.65 inconclusive
sourcestringPipeline branchnpi_direct / npi_fuzzy / npi_fuzzy_spell_corrected / tier_b_web_search
tier"A" \| "B"Pipeline branchWhich tier produced the match
npistring \| nullNPPES10-digit NPI; null for Case-3 non-NPI HCPs
name_prefixstring \| nullNPPES basic.name_prefixe.g. "DR."
firstName / middleName / lastName / name_suffixstringsNPPESRaw NPPES casing
credentialstring \| nullNPPESe.g. "M.D.", "RN", "PA-C"
gender"M" \| "F" \| nullNPPESSelf-reported
job_titlestring \| nullComposite — credential + taxonomye.g. "Internal Medicine, M.D.". Synthesized; not a raw NPPES field.
job_descriptionstring \| nullNPPES taxonomy descSame value as primary_taxonomy.name, surfaced as a friendly alias.
primary_taxonomyobjectNPPES{ code, name, license, license_state }
secondary_taxonomiesarrayNPPESMulti-specialty providers' additional taxonomies
practice_addressobjectNPPES address with purpose=LOCATION{ street, street_2, city, state, zip, phone, fax, formatted }
mailing_addressobject \| nullNPPES address with purpose=MAILINGSame shape as practice_address; null if same as practice
additional_practice_locationsarrayNPPES practiceLocationsMulti-site practitioners
place_of_employmentobject \| nullTier B inference{ name, source, evidence_url } — extracted from web snippets matching "at <Org>" / LinkedIn patterns. source: "tier_b_web_inference" always — never authoritative.
job_title_inferredobject \| nullTier B inferenceSame shape; only set when LinkedIn-style "Title · Org" pattern fires.
enumerated_sinceYYYY-MM-DDNPPESWhen the NPI was issued
last_updatedYYYY-MM-DDNPPESWhen the NPPES record was last touched
match_signalsarray of stringsScoringWhich Δ-score signals fired
match_reasoningstringFunctionHuman-readable explanation
flagsarrayFunctione.g. ["NOT_IN_NPI_REGISTRY"], ["ROLE_NON_CLINICAL"]
spell_correctedobject \| absentCase 3b only{ from, to }
tier_b_evidencearrayTier B classifierUp to 3 supporting web hits (Cases 1, 2, 3)
Enrichment fields (v0.3.x) — derived after Tier A + Tier B by the LLM layer
is_hcpbooleanNPPES / LLMHCP facet. true if the person holds a real clinical credential or practices clinically (definitive on an NPI/credential match), regardless of their current job title. Independent of is_executive.
is_executivebooleanLLMExecutive facet. true for a leadership / administrative / business role (CEO, VP, Director, Dean, Chief Nursing Officer, …). Independent of is_hcp — a credentialed executive (e.g. a Chief Nursing Officer who is an RN/DNP) is both true.
current_employment + current_employment_sourcestring \| nullTier B regex / LLMCurrent employer. Prefers the NPPES/Tier-B inferred place_of_employment.name (_source: "tier_b_web_inference"); for non-HCPs with no such match, falls back to the LLM-extracted employer ("llm_generated:<model>") — e.g. a CEO's company. Null when no employer is evident.
current_position + current_position_sourcestring \| nullLLM (employer portal) / NPPESCurrent job title/role. The LLM sources it from the employer's own official website / leadership page first (then curated directories / LinkedIn, then press), using one clean canonical title and never merging stale or multi-source titles — e.g. "Group Senior Vice President, Chief Nursing Executive". A bare NPPES credential (e.g. "DNP") is never used as a role.
highest_education + highest_education_sourcestring \| nullNPPES credential / LLMHighest level of education / terminal degree, e.g. "Doctor of Medicine (M.D.)". Deterministic from a clear NPPES credential (_source: "nppes_credential"); otherwise LLM-inferred from web evidence ("llm_generated:<model>") — e.g. a "Professor of Medicine" → "Doctorate (M.D. and/or Ph.D.)".
position_function + position_function_sourcestring \| nullLLM1–2 sentence plain-language explanation of what someone in this position/specialty typically does, for a non-clinical reader. _source: "llm_generated:<model>". Null when no position was identified or the LLM layer is unavailable.
call_recommendation + call_recommendation_sourceobject \| nullLLM{ should_call, category, rationale, confidence } — whether NNI medical affairs should call on this person even if they are not an HCP — e.g. a professor of medicine, dean / department chair, KOL / thought leader, podcaster or online influencer, P&T / formulary committee member, professional-association member, patient-advocacy NGO / non-profit member, researcher, or payer. See the category enum below. _source carries provenance or an unavailable: <reason> note.
NPPES has no native "employer" or "job title" fields. The place_of_employment, current_employment, and job_title_inferred objects come from Tier B web search and are best-effort. Treat them as hints, not authoritative facts. The job_title / job_description / current_position fields are synthesized from NPPES credential + taxonomy and are deterministic for any NPI match.
position_function and call_recommendation are LLM-generated (added v0.3.0) from a single live model call per request. They are advisory judgments, not facts — call_recommendation in particular is a triage hint for whom to approach, not a vetted determination. The screener degrades gracefully: if the LLM layer is unconfigured or fails, both fields return null with a reason in *_source and the rest of the result is unaffected.

call_recommendation.category enum: HCP, KOL/Influencer, P&T/Formulary Committee, Patient Advocacy/NGO, Professional Association, Researcher/Academic, Payer/Managed Care, Internal/Own-Company, Industry/Other, Not Recommended, Unknown.

Status values

  • matched — a single identity meets the confidence threshold. Cases 0, 1, 3, 3b, and 4 produce this.
  • inconclusive — a top candidate exists but its score is below the 0.65 match threshold. alternates are returned for human review.
  • not_hcp — Tier B found clear non-clinical role signals (Sales, Marketing, Realtor, etc.). NPI hits, if any, appear to be different individuals.
  • unknown — nothing found in either tier. raw_hits from Tier B (if any) are returned for adjudication.

The 6 disposition cases

The function evaluates branches in order. The first that matches wins.

matched Case 0 — Tier A high-confidence

NPI Registry returns a candidate with score ≥ 0.85 AND ahead of the runner-up by at least 0.15. Tier B still runs in parallel to populate place_of_employment from web evidence — that's why tiers_run reports ["A", "B"] even on high-confidence Tier A matches.

{
  "ok": true,
  "match": {
    "status":     "matched",
    "confidence": 0.95,
    "source":     "npi_direct",
    "tier":       "A",
    "npi":        "1992894257",
    "name_prefix":"DR.",
    "firstName":  "ILENE", "middleName": "LAUREN", "lastName": "FRIEDMAN",
    "credential": "M.D.",
    "gender":     "F",
    "job_title":       "Internal Medicine, M.D.",
    "job_description": "Internal Medicine",
    "primary_taxonomy": {
      "code":         "207R00000X",
      "name":         "Internal Medicine",
      "license":      "198905",
      "license_state":"NY"
    },
    "secondary_taxonomies": [],
    "practice_address": {
      "street":    "79 MIDDLEVILLE RD",
      "street_2":  null,
      "city":      "NORTHPORT",
      "state":     "NY",
      "zip":       "11768",
      "phone":     "631-261-0050",
      "fax":       "631-754-3017",
      "formatted": "79 MIDDLEVILLE RD, NORTHPORT, NY 11768"
    },
    "mailing_address":               null,
    "additional_practice_locations": [],
    "place_of_employment": {            // present only when Tier B inferred one
      "name":         "Huntington Hospital",
      "source":       "tier_b_web_inference",
      "evidence_url": "https://www.huntingtonhospital.org/..."
    },
    "job_title_inferred": null,         // populated when LinkedIn-style "Title - Org" pattern hits
    "enumerated_since":   "2006-10-12",
    "last_updated":       "2024-08-15",
    "match_signals":      ["state_match", "city_match", "zip_match", "specialty_match"],
    "match_reasoning":    "Tier A match via variant \"exact:Ilene Friedman/NY\". Signals: state_match, city_match, zip_match, specialty_match."
  },
  "alternates": [],
  "tiers_run":  ["A", "B"]
}
matched Case 1 — Tier A medium + Tier B corroborates

NPI score is 0.65 ≤ x < 0.85, and the Tier B classifier confirms HCP signals (degree mentions like "MD" in name-verified, geo-aligned snippets).

{
  "ok": true,
  "match": {
    "status":     "matched",
    "confidence": 0.75,
    "source":     "npi_direct" | "npi_fuzzy",
    "tier":       "A",
    /* full NPI record fields */
    "tier_b_evidence": [ /* up to 3 corroborating web hits */ ],
    "match_reasoning": "Tier A direct match (0.75); Tier B corroborates HCP via web hits (degree=MD)."
  },
  "alternates": [ ... ],
  "tiers_run":  ["A", "B"]
}
not_hcp Case 2 — Tier B finds non-clinical role

Tier B's classifier hits at least one non-clinical role signal (Sales, Marketing, Realtor, VP, Attorney, etc.) and the NPI candidates appear to be different individuals (geographic or context mismatch).

{
  "ok": true,
  "match": {
    "status":     "not_hcp",
    "confidence": 0.6,
    "source":     "tier_b_web_search",
    "tier":       "B",
    "tier_a_hits": 1,
    "flags":      ["ROLE_NON_CLINICAL"],
    "match_reasoning": "Tier B found 1 non-clinical role signal(s) in web hits. Tier A NPI hits (1) appear to be different individuals; geographic or context mismatch."
  },
  "alternates": [],
  "tiers_run":  ["A", "B"]
}

Confidence formula: min(0.85, 0.5 + 0.1 × negative_signals).

matched Case 3 — Tier B finds HCP, no NPI

The 2026-05-18 broadened scope. Tier A couldn't place the person in NPPES, but Tier B web hits show a clear clinical degree/credential signal in geo-aligned, name-verified snippets.

Common for non-billing clinical roles: research nurses, educators, formulary pharmacists, administrators.

{
  "ok": true,
  "match": {
    "status":     "matched",
    "confidence": 0.78,
    "source":     "tier_b_web_search",
    "tier":       "B",
    "npi":        null,
    "firstName":  "...", "lastName": "...",
    "credential": "RN" | "PA" | "MD" | "...",
    "degree":     "RN" | "PA" | "MD" | "...",
    "tier_b_evidence": [ ... ],
    "flags":      ["NOT_IN_NPI_REGISTRY"],
    "match_reasoning": "Tier A could not place this name confidently in NPI Registry (top score 0.40). Tier B web hits show clear clinical degree/credential signal — likely an HCP not enumerated in NPI."
  },
  "alternates": [],
  "tiers_run":  ["A", "B"]
}
matched Case 3b — Spell-correction retry

Tier B's classifier detects that the last-name token most consistent with the corroborating hits is a Levenshtein-distance-1+ correction of the input. The function then re-runs Tier A with the corrected spelling — and if that returns a confident NPI hit, we surface the corrected record.

This is the deepest pipeline branch: tiers_run: ["A", "B", "A-retry"].

{
  "ok": true,
  "match": {
    "status":     "matched",
    "confidence": 0.78,
    "source":     "npi_fuzzy_spell_corrected",
    "tier":       "A",
    /* full NPI record from the corrected query */
    "spell_corrected": { "from": "Moscaritols", "to": "Moscarito" },
    "match_reasoning": "Input lastName \"Moscaritols\" appears to be a typo. Tier B web search surfaced \"Moscarito\" as a likely correction; Tier A re-run with the corrected spelling produced a confident NPI match."
  },
  "tiers_run": ["A", "B", "A-retry"]
}
matched Case 4 — Tier A medium alone

Same shape as Case 1, but Tier B either ran without corroborating or wasn't available (no provider configured / out of credits). The reasoning explicitly says so.

{
  "match": {
    "status":     "matched",
    "confidence": 0.65,
    "source":     "npi_direct",
    "tier":       "A",
    /* full NPI record */
    "match_reasoning": "Tier A direct match at medium confidence (0.65). Tier B did not corroborate."
  }
}
inconclusive Case 5 — Below match threshold

A top candidate exists but its score is < 0.65. alternates are returned for adjudication; raw_hits from Tier B (up to 5) too.

{
  "match": {
    "status":     "inconclusive",
    "confidence": 0.45,
    "tier":       "A",
    "match_reasoning": "Tier A produced a candidate at score 0.45 (below the 0.65 match threshold). Tier B did not provide a clear signal."
  },
  "alternates": [ ... ],
  "raw_hits":   [ ... ]
}
unknown Case 6 — Nothing found

Tier A produced zero candidates after all 5 passes, and Tier B (if available) returned no name-verified credential signals.

{
  "match": {
    "status":      "unknown",
    "confidence":  0,
    "tiers_tried": ["A", "B"],
    "variants_tried": [ "exact:Foo Bar/NY", "variant:Foo Baz/NY", ... ],
    "match_reasoning": "No matches in Tier A (NPI Registry, 7 variants tried). Tier B returned 0 hits, none with clear HCP credential signal."
  },
  "alternates": [],
  "raw_hits":   []
}

Tier A — NPI Registry cascade

A 5-pass progressive expansion against NPPES (the public CMS API at npiregistry.cms.hhs.gov). Free, no auth.

Name normalization (pre-cascade)

  • Diacritic stripJoséJose, MüllerMuller via NFD + combining-mark removal.
  • Whitespace collapse + trim — runs of spaces become single spaces; leading/trailing punctuation stripped.
  • Suffix & credential stripping — trailing tokens matching the suffix/credential vocab are removed before lookup:
    • Suffixes: jr, sr, ii, iii, iv, v
    • Degrees: md, do, dds, dmd, dpm, dpt, pharmd, dnp, phd, dvm, od
    • Credentials: rn, aprn, crna, cnm, np, pa-c, pa, rph, rrt, rdn, lcsw, msw
    • Academic: mba, msc, ms, ma, ba, bsn, msn, dnsc, edd, jd
    • Fellowships: facs, facp, fasco, faan, faonl, nea-bc, ne-bc, ahn-bc, ccrn, cphq
    Dotted forms (M.D.) are normalized first by stripping periods.

Nickname expansion

A bidirectional table of ~110 canonical names with their common nicknames. The lookup walks both directions — input "Bob" expands to {Bob, Robert, Bobby, Robbie, Bert}; input "Robert" expands to the same set. Both male and female forms covered.

Sample of the nickname table (click to expand)
CanonicalVariants
RobertRob, Bob, Bobby, Robbie, Bert
WilliamWill, Bill, Billy, Willie, Liam
RichardRick, Dick, Rich, Richie, Rickey
MichaelMike, Mikey, Mick, Mickey
JamesJim, Jimmy, Jamie, Jay
JohnJack, Johnny, Jon
AnthonyTony, Ant
ElizabethLiz, Beth, Betty, Eliza, Lizzie, Ellie, Libby, Betsy, Bess
MargaretMaggie, Meg, Peggy, Margie, Madge, Greta
CatherineCathy, Kate, Katie, Kit, Cat
ReynaldoRey, Ronnie, Naldo

Full table at src/lib/name_variants.js lines 6–149.

Last-name variants

Hyphenated last names fan out:

"Smith-Jones" → ["Smith-Jones", "Smith Jones", "Smith", "Jones"]

Each variant is tried as a separate NPPES query (NPPES wildcard-suffixes every query automatically — see "wildcard trick" below).

NPPES wildcard trick

NPPES's default search does exact matches on first_name and last_name. That misses compound first names ("SUSAN ANABELLE" stored as one field). The function appends a trailing * to both fields on every query, which catches compounds without breaking exact matches.

The 5-pass cascade

1
Exact name + state filter — the cheapest, most precise pass.
Label: exact:<First> <Last>/<ST>
Tier A · pass 1
2
Nickname × last-name variants, with state filter. Cross-product of first-name expansions × hyphen-split last names.
Label: variant:<Nick> <LastVariant>/<ST>
Tier A · pass 2
3
Drop state filter — broader net. Only fires if a state hint was given AND no candidate so far scored ≥ 0.80.
Label: nostate:<First> <Last>
Tier A · pass 3
4
First-initial + last-name + state. Catches badge data with just "R. Rivera" instead of "Reynaldo Rivera".
Label: initial:R* <Last>/<ST>
Tier A · pass 4
5
First/last transposition + state. Some badge data has the fields swapped — try first=<Last>, last=<First>.
Label: swap:<Last> <First>/<ST>
Tier A · pass 5
Early exit: if pass 1 produces any candidate with score ≥ 0.90, passes 2–5 are skipped entirely. Most well-formed inputs short-circuit here.

Tier B — Web search + classifier

Provider-agnostic search layer (Brave / Serper / Google CSE). Used to corroborate Tier A medium-confidence matches and to identify HCPs that aren't in NPPES.

Provider selection

Driven by two env vars on the Netlify site:

WEB_SEARCH_PROVIDER = "serper" | "brave" | "google_cse"
WEB_SEARCH_API_KEY  = "..."
GOOGLE_CSE_ID       = "..."   // only for google_cse

If neither is set, Tier B returns { available: false, reason: "..." } and the function falls through to a Tier-A-only response.

Currently configured: Serper.dev (paid plan, 50k queries/month).

The 5 queries Tier B fires per attendee

  1. Tightest: "<First Last>" "<City>" <ST> — name + quoted city + state. Catches local non-clinical context too (deliberately broad to detect "not_hcp").
  2. Name + city + clinical-role terms: "<First Last>" "<City>" <ST> doctor physician nurse pharmacist
  3. Name + employer + specialty: "<First Last>" <employer> <specialty> (only if those fields were provided)
  4. LinkedIn fallback: "<First Last>" "<City>" linkedin
  5. Typo tolerance: <First Last> <ST> doctor physician nurse pharmacistunquoted, lets Google auto-spell-correct.

Queries are deduplicated and any whose body is essentially just the name are dropped. The provider returns up to 5 hits per query.

Classifier — classifyHits()

Up to the first 15 aggregated hits are inspected. For each, the title + snippet are scanned for three regex patterns:

Positive degree pattern

/\b(MD|D\.O\.|DO|DDS|DMD|DPM|DPT|PharmD|DNP|PhD|DVM|OD)\b/

Positive credential pattern

/\b(RN|APRN|CRNA|CNM|NP|PA-C|PA|RPh|RRT|RDN|NEA-BC|FAAN|FACS|FACP)\b/

Negative-role pattern

/\b(VP|Vice President|CEO|CFO|COO|Founder|Investor|Sales|Marketing|
       Realtor|Real Estate|Attorney|Lawyer|Tenant|Tenants Association|
       Activist|Politician|Councilman|Mayor)\b/i

Name verification — Levenshtein, both names required

Every positive hit must have both the input firstName and lastName fuzzy-matched to tokens in the snippet via Levenshtein distance. This was added 2026-05-18 after two false-positive incidents:

  • "Kevin Weller / Jersey City NJ" — was matching a snippet about "Kevin John Weller, APRN" in New Hampshire (different state).
  • "Joyce Moscaritols" (typo) — was matching "Dr. Michael Joyce, MD" because the input firstName "Joyce" appeared as someone else's last name.

Distance threshold per token: min(3, max(1, ceil(needle.length × 0.30))). So a 6-letter name allows up to 2 edits; a 10-letter name allows up to 3.

Geographic alignment

Each name-verified credential snippet is tagged:

  • aligned — the input state code OR full state name appears in the snippet, OR the input city appears in the snippet
  • mismatched — state codes appear in the snippet, but none match the input state
  • unknown — no state info detected

State-code matching uses comma/space context ("City, ST" patterns) to avoid false positives like "MS Excel."

Decision logic

if (neg > 0 && name_matched_aligned === 0)        → isHCP: false
else if (name_matched_aligned ≥ 1)                  → isHCP: true
else if (name_matched_count ≥ 2 && no mismatch
                            && no negatives)        → isHCP: true
else                                                → isHCP: false

The classifier also returns a spell_corrected_last_name field whenever the name-verified last-name match has distance > 0 — i.e., the snippet's lastName isn't identical to the input's. This is what triggers the Case 3b retry.

LLM enrichment layer (added v0.3.0)

After Tier A + Tier B settle on a disposition, a single live LLM call enriches the result with two judgment fields: position_function and call_recommendation. Runs on every disposition — including not_hcp and unknown — because the call recommendation is most useful for non-HCPs.

What it produces

  • highest_education — highest level of education / terminal degree. Deterministic from a clear NPPES credential (M.D., PharmD, DNP, Ph.D., …); otherwise inferred from web evidence (e.g. "Dean of the School of Medicine" → a doctorate). null only when there's no signal.
  • position_function — a 1–2 sentence plain-language explanation of what the identified position/specialty/credential typically does (core role & responsibility), for a non-clinical reader. null when no position was identified.
  • call_recommendation{ should_call, category, rationale, confidence }. Confirmed HCPs return should_call: true, category: "HCP". For non-HCPs, the model judges from the web evidence whether the person is still a valuable medical-affairs contact in the diabetes/obesity/cardiometabolic space — a professor of medicine, dean / department chair, KOL / thought leader, podcaster / online influencer, P&T or formulary committee member, professional-association member, patient-advocacy NGO/non-profit member, researcher/academic, or payer/managed-care. A current Novo Nordisk / NNI employee returns category: "Internal/Own-Company" (own-company colleague, not an outreach target). Clearly unrelated roles or namesakes return should_call: false, category: "Not Recommended".
  • social_profiles — an array of the person's reliably-attributed digital profiles: [{ platform, url, handle, followers, reliable, basis }]. Only profiles corroborated to this person (by name + employer/location/specialty) are included; likely namesakes are excluded. followers is populated only when a snippet states it.
  • dol — Digital Opinion Leader assessment: { is_dol, tier, topics, rationale, confidence } where tiernone | emerging | established | leading. Grades digital influence in the diabetes/obesity/cardiometabolic space from the reliable profiles + evidence (audience size, posting cadence, podcast/video hosting, conference presence). A plain profile with no influence signal stays tier: "none".
  • identity_ids — resolved cross-reference ids { cdm_id, ims_id, npi_id } from the OSP identity service (looked up by last name + first name + ZIP). Any field may be null. The npi_id here independently corroborates the Tier-A NPI match.
  • prescribing_data — IQVIA-style prescribing for NNI cardiometabolic products, fetched by ims_id: { source, ims_id, total_trx, total_nbrx, prescribed[], products[] }. Any cardiometabolic prescribing (total_trx > 0) is treated by the recommendation gate as a direct in-scope tie — so a clinician in an adjacent specialty who actually prescribes (e.g. a neurologist on Ozempic) is recommended, while a zero-TRx result is affirmative evidence of non-relevance.

Specialty-relevance gate: a confirmed HCP is only should_call: true when their specialty is core to NNI's diabetes/obesity/cardiometabolic focus (endocrinology, primary care, cardiology, nephrology, obesity medicine) OR there is specific evidence of a cardiometabolic tie — prescribing data, or a documented sub-focus (e.g. diabetic neuropathy). HCPs in unrelated specialties return should_call: false with a potential_relevance note describing what would make them relevant. A current Novo Nordisk / NNI employee returns category: "Internal/Own-Company".

Configuration

Driven by env vars on the Netlify site:

LLM_PROVIDER   = "anthropic" | "openai"      // default: anthropic
LLM_API_KEY    = "..."                        // provider API key
LLM_MODEL      = "claude-haiku-4-5-20251001"  // default per provider
LLM_TIMEOUT_MS = "6000"                        // optional per-call timeout

Graceful degradation: if LLM_API_KEY is unset, or the call fails / times out / returns unparseable output, both fields come back null with a reason in their *_source field. The screening result is never blocked on the LLM. The call is also skipped (no spend) when there is no position and no web signal to reason about.

Identical prompts are de-duplicated by an ephemeral per-container memo (saves tokens during batch runs); this is process memory only, not persistence — the stateless guarantee holds.

Spell-correction retry (Case 3b)

The deepest sequential branch in the pipeline — tiers_run: ["A", "B", "A-retry"].

When Tier B's classifier emits a spell_corrected_last_name that differs from the input's lastName (case-insensitive), the function:

  1. Builds correctedInput = { ...input, lastName: spell_corrected_last_name }
  2. Calls tierA_npiFuzzy(correctedInput) — a fresh 5-pass NPPES cascade with the corrected name
  3. If the retry's top candidate scores ≥ 0.65, returns the corrected match with source: "npi_fuzzy_spell_corrected" and a spell_corrected: { from, to } field
  4. Otherwise falls through to Cases 1–6 with original input
This is what makes typo'd badge data recoverable. Input "Moscaritols" still returns the correct NPI record for Dr. Joyce Moscarito — with the corrected spelling labeled in the response so the consumer knows it was repaired.

Scoring & confidence

Every NPI candidate gets a score in [0, 1]. The base is 0.5 (caller already filtered by name); signals add or subtract.

Signal weights

SignalWhen it firesΔ score
state_matchInput state == NPI practice OR mailing state+0.20
state_mismatchBoth input and NPI have states, and they differ−0.25
city_matchInput city (lowercased) == NPI practice/mailing city+0.10
zip_matchInput ZIP (5-digit) == NPI practice ZIP (5-digit)+0.10
specialty_matchInput specialty_hint appears in NPI primary taxonomy description (either direction)+0.15
specialty_soft_mismatchBoth have specialty info but no substring overlap−0.03
employer_substringInput employer substring-matches NPI practice address_1+0.08

Final score is clamped to [0, 1].

Disposition thresholds

Top scoreRunner-up gapDisposition
≥ 0.85≥ 0.15 ahead of #2matched   high confidence — Case 0 short-circuit
≥ 0.85< 0.15 ahead of #2Falls through to Tier B (ambiguous top hit)
0.65 – 0.84anymatched   medium confidence (Cases 1 / 4)
< 0.65 (but > 0)anyinconclusive   Case 5
0unknown   Case 6

Full pipeline

The complete sequence of branches from input to response.

1
CORS preflight (OPTIONS → 204) · Method gate (POST only) · Auth check (X-API-Key)
screen.js:22-36
Pre-flight
2
Parse JSON · validate firstName + lastName · extractCredentials · normalize
screen.js:39-68
Normalize
3
tierA_npiFuzzy(input) — 5-pass cascade against NPPES
screen.js:75 → npi_fuzzy.js
Tier A
If aDisp.confidence ≥ 0.85 → return matched Case 0 immediately
screen.js:83-95
Case 0
4
tierB_webSearch(input) — 5 Serper queries
screen.js:103 → web_search.js
Tier B
5
classifyHits(tierB.hits, input) — name verification + geo alignment + degree/credential extraction
screen.js:105 → web_search.js
Tier B classifier
6
If classifier flagged spell_corrected_last_name ≠ input.lastName → tierA_npiFuzzy with corrected name
screen.js:110-129
Tier A retry
If retry confidence ≥ 0.65 → return matched Case 3b with spell_corrected annotation
screen.js:115-128
Case 3b
7
Evaluate Cases 1 → 2 → 3 → 4 → 5 → 6 in order, return the first that matches
screen.js:140-246
Final disposition

Maximum external HTTP calls per screening

SourceMax callsWhen
NPPES (Tier A #1)~10One per name variant — exact, nicknames, hyphen splits, no-state, initial, swap
Serper (Tier B)54 quoted variants + 1 unquoted spell-tolerant
NPPES (Tier A retry)~10Only fires on Case 3b spell-correction
Total worst case~25 outbound HTTPS callsPer single screening

HTTP status codes

CodeWhen
200 + ok:trueScreening completed — any of matched / inconclusive / not_hcp / unknown
204CORS preflight (OPTIONS)
400Malformed JSON body, or firstName / lastName missing
401Missing or wrong X-API-Key header
405Any method other than POST or OPTIONS
502Tier A error (NPPES upstream failure). Tier B errors do not propagate as 502 — they degrade the response to Tier-A-only with reasoning that notes Tier B was unavailable.

Stateless guarantee

No persistence anywhere. Each request is independent.

This is an explicit project decision dated 2026-05-18. No database, no call logs, no audit trail. The function code has no fs.writeFile, no Supabase client, no logging service. The only writes the function performs are the response body itself.

If/when analytics or compliance auditing is needed, that's a meaningful architectural decision that requires explicit sign-off — not a one-liner addition.

Try it live

Fire a real screening from this page. Uses the same API key your main app does (stored in this browser's localStorage).

No data leaves your browser except the API call itself.