After getting the basic plugin architecture working (covered in Part 1), I turned my attention to the dealer network sources — the ones with the most inventory depth and the most price variance. These are also the most aggressively protected against automated access.
My first attempt: standard httpx requests with a realistic user-agent string. Blocked on the first request at two out of three dealer sites. The third returned a 200 but served a challenge page instead of the catalog.
Okay, fine. I'd use Playwright.
Why Headless Chrome Gets Fingerprinted
A headless Chromium browser isn't actually invisible. Browser fingerprinting scripts check a surprisingly detailed set of properties to distinguish real users from bots:
JavaScript properties. navigator.webdriver is true in Playwright-controlled browsers by default. Easy to set to false — but fingerprinting scripts don't stop there. They check navigator.plugins, which is typically empty in headless environments. They check navigator.languages, navigator.hardwareConcurrency, screen.colorDepth. They check Chrome-specific properties like window.chrome.runtime.
Canvas and WebGL fingerprinting. The same canvas drawing operations produce slightly different pixel output on different hardware and OS combinations. Headless environments produce a characteristic fingerprint that doesn't match any common real-user configuration.
TLS fingerprinting. At the connection level, the TLS handshake includes a JA3 fingerprint based on cipher suite ordering and TLS extension negotiation. Chromium's TLS fingerprint is well-known. Some WAFs block based on JA3 alone, before the HTTP layer.
Behavioral signals. Real users move the mouse. They scroll. Their timing between requests follows human patterns. A scraper that fires requests at perfectly regular intervals or without any prior mouse activity is statistically distinguishable from a real user.
I tried playwright-stealth — a library that patches many of the JavaScript-level properties. It helped. My bypass rate went from ~20% to ~55%. Not good enough.
Camoufox
Camoufox is a modified Firefox build specifically designed to defeat browser fingerprinting. It addresses the problem at a lower level than stealth patches:
Randomized fingerprint properties on each launch. Canvas rendering output, WebGL extension lists, audio context properties, hardware concurrency — all randomized within plausible real-user ranges. No two Camoufox sessions produce the same fingerprint.
TLS fingerprinting handled at the connection layer. Camoufox uses Firefox's TLS stack rather than Chromium's, which has a different JA3 fingerprint, and it randomizes the extension ordering to produce different fingerprints across sessions.
Realistic behavior generation. Mouse movement patterns, scroll events, and timing are modeled on real user behavior rather than the uniform timing typical of automated scripts.
Switching to Camoufox brought my dealer site bypass rate from ~55% (playwright-stealth) to >95%. The three sites that were completely blocking me before all started returning catalog results.
The Implementation
Camoufox integrates with Playwright's API, so I didn't have to rewrite the plugins — just swap the browser context:
from camoufox.async_api import AsyncCamoufox
async def get_browser_context():
async with AsyncCamoufox(headless=True, os="mac") as browser:
context = await browser.new_context()
return context
The os="mac" parameter tells Camoufox to generate a fingerprint consistent with macOS — matching the machine I'm running on, which avoids the inconsistency of a Windows fingerprint coming from a Mac IP address.
I added a session pool: rather than spawning a new browser context per request, I maintain a pool of 3–4 persistent contexts. Each context accumulates some browsing history (a few page loads before hitting the target site) to look less like a fresh session. This improved reliability on the most aggressive WAF configurations.
Rate Limiting and Cooling
Even with Camoufox, I'm careful not to hammer the same site repeatedly. Each plugin has a configurable rate limiter — minimum seconds between requests to the same domain, with jitter applied to avoid perfectly regular timing patterns:
async def rate_limited_request(url: str, min_delay: float = 2.0):
jitter = random.uniform(0.5, 1.5)
await asyncio.sleep(min_delay * jitter)
# make request
The delay is per-domain, not global. If plugin A is waiting for a rate limit on site X, plugins B through N can still be running concurrently.
Results
With Camoufox handling the dealer network plugins:
- 14 active plugins covering 18 sources
- >95% bypass rate on previously blocked targets
- Average search time for a full 14-plugin sweep: 8–14 seconds
- Price data from dealer networks now consistently available, closing the biggest gap in Part 1
The price variance data got more interesting once dealer networks were included. For some part categories, the spread widened to 4–5x. OEM dealers consistently price at full retail regardless of availability; independent OEM suppliers discount heavily when inventory is plentiful.
In Part 3, I'll cover the normalization layer — the problem of matching the same part number appearing in four different formats across fourteen sources, and why fuzzy matching is the only viable approach.