Building OEM Hunter, Part 1: Why Automotive Parts Sites Are a Mess (and How I Architected Around It)

I needed a specific ABS sensor for a 2019 Tacoma. There is no central OEM parts database. Every dealer network runs its own catalog. Prices vary by 40–60% across sources for the identical part. The only way to find the best price is to check each one manually.

I opened six browser tabs, typed the same part number into six different search boxes, and spent twenty minutes comparing results. Found the part for $67 at an independent OEM supplier. The first dealer I'd checked quoted $142.

That was the moment I decided to build something.

Why It's Harder Than It Looks

Before writing a single line of code, I spent a few days manually cataloging the problem. What I found:

There is no standard API. Every supplier has a different interface: some have search endpoints, some require navigating a multi-step catalog flow, some serve part listings as rendered HTML with no consistent structure, a few still use flash-era catalog viewers that are essentially inaccessible to any automated approach.

Part numbers aren't standardized. A single OEM part appears across suppliers as 18-B5032, 18B5032, 018-B5032, and WK18B5032: all the same component, four different representations. Any system that relies on exact string matching will miss most results.

Bot protection is aggressive. Automotive dealer networks have invested heavily in anti-scraping infrastructure. Standard requests with a fake user-agent gets blocked on first contact at most dealer sites. Even Playwright with a headless browser gets fingerprinted and blocked within a few requests. I'll cover this in detail in Part 2, where it became its own engineering problem.

The Architecture

I landed on a plugin-based orchestrator. Here's the core idea:

query (part number or description)
    ↓
Orchestrator
    ↓ fans out to all plugins simultaneously
[Plugin A] [Plugin B] [Plugin C] ... [Plugin N]
    ↓ all return normalized results
Result merger + ranker
    ↓
Sorted output: vendor, price, availability, shipping, URL

Each plugin is a self-contained Python module that knows how to talk to one specific source. The plugin interface is minimal:

class BasePlugin:
    async def search(self, part_number: str) -> list[PartResult]:
        ...

That's it. A plugin takes a part number, returns a list of results. How it gets those results (HTTP request, Playwright session, API call) is entirely its own business. The orchestrator doesn't know or care.

This design has two big advantages:

Independent failure. When a site goes down, changes its structure, or starts blocking requests, only one plugin breaks. Everything else keeps working. I can fix and redeploy a single plugin without touching anything else.

Independent evolution. Some sites needed basic httpx requests. Others needed full browser automation. Some required session cookies from a login flow. Each plugin handles its own complexity; none of that leaks into the orchestrator.

The Orchestrator

The orchestrator uses asyncio to fire all plugins simultaneously. For 14 plugins, the total response time is roughly the slowest individual plugin, not the sum of all of them. In practice, most results come back within 3–5 seconds. The outliers (sites that require Playwright sessions) add another 8–12 seconds.

async def search_all(part_number: str) -> list[PartResult]:
    tasks = [plugin.search(part_number) for plugin in PLUGINS]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    # filter exceptions, flatten lists, deduplicate, sort by price
    ...

Exceptions from individual plugins are caught and logged, not raised. If three out of fourteen plugins fail, you still get eleven sets of results. Fail-open is the right default here.

Normalizing the Output

Each plugin returns results in its own format. The orchestrator normalizes them into a common schema before merging:

@dataclass
class PartResult:
    vendor: str
    part_number: str          # normalized form
    raw_part_number: str      # as-found on the site
    price: Decimal
    currency: str
    availability: str         # "In Stock" | "Ships in X days" | "Backorder" | "Unknown"
    shipping_estimate: str
    url: str
    plugin_id: str
    timestamp: datetime

The part_number field is the normalized form I use for deduplication; if two plugins return results with different raw part numbers that normalize to the same code, I merge them into a single result with both vendor prices. The normalization logic is the subject of Part 3. It was the hardest part of this project.

First Results

With 14 plugins targeting 18 sources (some plugins cover multiple sub-sources), the initial results were encouraging. For the ABS sensor query that started all this:

11 out of 14 plugins returned results
Price range: $52 (independent OEM supplier) to $189 (OEM dealer)
3 plugins failed: 2 were blocking my initial HTTP approach (WAF, covered in Part 2), 1 had broken their API

The 3.6x price spread across sources confirmed the original intuition. If you're not checking multiple suppliers, you're consistently overpaying.

In Part 2 I'll cover what happened when I tried to add dealer network sources, and the WAF bypass I had to build to make it work.