Methodology: how StackPatrol detects and classifies vendors

Crawl

Headless Chromium loads your URL

Record

Every network request captured

Match

728 known vendor patterns

Score

EU Independence Score computed

1. Crawling

We launch a headless Chromium browser (via Playwright) with a desktop User-Agent and a 1366×800 viewport. We then load the URL you provided and wait for the page to reach networkidle for up to six seconds.

To improve coverage we also try to scan one additional internal page. We pick the most “interesting” same-domain link we can find, typically checkout, contact, pricing, signup or login, because those pages often load payment, support and conversion-tracking scripts that the front page does not.

Free scans load the page once and record what the browser does without any interaction — no logins, no form submissions, no consent-banner clicks. Paid plans automatically run consent-aware scanning: we re-open the site in a clean browser context, click an “Accept all” button on the consent banner, then revisit the entry page plus up to four of the baseline subpages in the same context (so the consent cookies and localStorage flags persist). The report shows a side-by-side delta of vendors, cookies and third-party domains that only load after consent.

The consent clicker searches both the main page and likely CMP iframes (Sourcepoint, OneTrust, Cookiebot, Didomi, TrustArc, Quantcast, Usercentrics, Iubenda, Klaro, Osano, Borlabs, Complianz) using a curated list of vendor selectors and accept-all button labels in ~15 languages. When deterministic selectors miss, an optional LLM fallback (gpt-4o-mini, cached per host) identifies the accept button from a pruned list of visible elements. Against an 18-site smoke test of major European publishers, the current hit rate is around 89%.

2. Recording requests

For every network request the browser makes we record the URL, the host and the resource type. A request is classified as third-party when its registrable domain is different from the registrable domain of the scanned site.

We use a small last-two-labels heuristic with a list of common two-part TLDs (.co.uk, .com.au, .co.jp, …) to determine the registrable domain.

On paid plans we also tag each third-party request with the page it fired on and whether it ran before or after consent. This provenance powers the “Found on” section under each vendor in the report, so you can point to the exact subpage a tracker loaded on — and whether it fired without consent — instead of just knowing it exists somewhere on the site.

3. Vendor matching

We maintain a curated database of 728 third-party vendors: browse the directory. Each vendor entry contains one or more domain patterns.

A request matches a vendor when:

its hostname equals the pattern, or
its hostname ends with . + pattern (suffix match), or
for the few non-domain patterns we use, the full URL contains the pattern.

When a request matches multiple patterns we pick the most specific (longest) one. Domains that don’t match any vendor are listed as unmatched.

4. Region classification

Each vendor is classified by ownership region. The classification is based on where the parent company is incorporated, not where data physically resides. A US-owned vendor with EU data centres is still classified as US because data-access requests (FISA 702, Cloud Act) are governed by ownership.

FISA 702 / Cloud Act

GDPR home jurisdiction

EEA / UK

Adequacy decisions

China

Data law concerns

5. The EU Independence Score

The score is an experimental signal, not a compliance rating. It starts at 100 and four penalty components are subtracted:

Score = 100 − P_vendor − P_mix − P_unknown − P_infra

P_vendor — jurisdiction risk × category

EU / EEA

GDPR home jurisdiction

Switzerland / UK

Adequacy decision in place

2–3

Global / Unknown owner

Jurisdiction unclear

5–6

US-owned

FISA 702 / Cloud Act jurisdiction

China-owned

PIPL / national security law jurisdiction

Each base risk is multiplied by a category weight. A US-owned tag manager (×1.8) is penalised more heavily than a US-owned font (×0.6). The product is capped at 20 per vendor and 60 total.

Data Privacy Framework nuance: for US-owned vendors we check EU–US Data Privacy Framework certification — the common lawful transfer basis after Schrems II. A certified vendor’s penalty is nudged down (×0.85); an uncertified one up (×1.15). Unknown status leaves the penalty unchanged.

Fonts / static assets×0.6

JS library / CDN×0.7

Analytics / error tracking×1.2

Payments / auth×1.3

Advertising / retargeting×1.6

Tag management×1.8

P_mix — non-EU ratio penalty (max −20)

When significantly non-EU vendors (US, China, Global) make up a large share of the classified vendor stack, an additional penalty applies — up to −20. The penalty is scaled by sample size so a single finding on a short scan does not over-fire.

Example: 3 US vendors, 1 EU vendor = 75% non-EU ratio → P_mix ≈ 15

P_unknown — unmatched domains (−4 each, max −25)

Domains not found in our vendor database reduce both the score and our confidence in the result. High confidence requires that under 15% of third-party domains are unclassified.

P_infra — non-EU infrastructure (max −22)

A site can run its trackers from the EU yet still host itself, its email or its DNS on a non-EU-owned provider. That is a first-order sovereignty concern (CLOUD Act / FISA 702) independent of the third-party stack, so we resolve the origin hosting, email (MX) and authoritative DNS (NS) providers and classify each by ownership region.

Hosting−8

Email (MX)−6 (max −10)

DNS (NS)−3 (max −6)

Only significantly non-EU providers (US, China, Global, Unknown) count. When infrastructure can’t be resolved, P_infra is 0 — it never penalises a scan for missing data.

Label guardrails

Labels are not derived purely from the numeric score. Hard rules prevent misleading labels even when the score is numerically high. For example, Mostly EU independent is blocked if non-EU vendors outnumber EU/EEA vendors, regardless of the score.

EU-first stack90–100

Mostly EU independent75–89

EU-leaning, with dependencies60–74

Mixed third-party stack40–59

High non-EU dependency20–39

Heavily non-EU dependent0–19

The score card in your report shows a full breakdown so you can see exactly how each component contributed, along with a confidence indicator.

6. What we don't do

We don’t determine GDPR or DSA compliance.
We don’t click consent banners on free scans (paid plans run an automatic post-consent pass).
We don’t crawl the entire site by default (front page + one internal link on free, up to 5 pages on Pro, up to 20 on Agency).
We don’t log in to authenticated areas or fill out forms.
We don’t store IP addresses in plaintext; they are salted-hashed daily.

7. Limitations

Consent-gated scripts are a significant blind spot on the free scan. Many tracking scripts only load after a user accepts a consent banner. Paid plans automatically add a second pass that clicks “Accept all” on the consent banner and then revisits the entry page plus up to four baseline subpages in the same authenticated context. This catches vendors that only fire post-consent on article or product pages. The click uses vendor-specific selectors, multilingual button labels, iframe traversal for Sourcepoint-style CMPs, and an optional LLM fallback. The click is best-effort: when no banner is detected, or detected but not actionable, the report flags this explicitly so you know whether the scan saw the full vendor picture.

Geographic bias also matters: some vendors serve different scripts based on the visitor’s country. We currently scan from a European IP, so results approximate what European visitors see.

The methodology evolves as we improve coverage. If you find a wrong classification, please let us know.

How StackPatrol works