← Back to Guides CrawlerLogs Home
Verification

Verified vs spoofed bots

One of the fastest ways to get bot reporting wrong is to trust the user-agent string too quickly. A request that says “Googlebot” is making a claim. Verification is the layer that tells you whether that claim deserves trust.

7 min read Trust model for bot traffic Useful for reporting and security
Diagram showing claimed, verified, and spoofed bot states and how validation changes trust.
Recognition and trust are not the same thing. Once you separate them, the story of your crawler traffic changes immediately.
The job of identification is to say what a request appears to be. The job of verification is to say whether you should believe it.

Why the distinction matters

If you do not separate claimed, verified, and spoofed traffic, you can draw the wrong conclusion from the same dashboard. A spike that looks like healthy search-engine attention may actually be impersonation. A quiet period may seem harmless until you realize the “bot traffic” you were watching was never trustworthy in the first place.

What a user-agent can and cannot tell you

The user-agent is still useful. It helps map a request to a likely bot name, owner, family, and category. But it is not proof. Any client can send a header that says Googlebot, Bingbot, GPTBot, or almost anything else that looks credible enough at first glance.

That is why collapsing everything into a simple boolean like “is bot” is not enough. It answers the recognition problem but ignores the trust problem.

What verification adds

Verification is the second layer. Depending on the bot family, that usually means checking source IP ownership, CIDR ranges, DNS patterns, or whatever validation model the bot owner documents publicly. Once you add that layer, the same traffic becomes much more operationally useful.

  • You can trust a verified hit differently from a merely claimed hit.
  • You can spot impersonation instead of counting it as legitimate crawl demand.
  • You can interpret spikes and gaps with much more confidence.

Why spoofed bots distort reporting

Spoofing is not just a security footnote. It changes the meaning of your visibility data. If a scraper or abusive client pretends to be a known crawler, your reporting can look healthier, more search-driven, or more AI-focused than it really is. That creates confusion for SEO, platform, security, and leadership teams all at once.

The practical status model

A useful product model is not simply “bot” or “not bot.” It is a trust ladder that tells the user how much confidence to place in each request.

Unknown

No useful match yet, so there is not enough context to classify or trust the request.

Claimed

The request matches a known bot signature, but validation has not been confirmed yet.

Verified

The identity claim matches and the documented validation checks pass.

Spoofed

The user-agent claims a known bot identity, but the validation layer says not to trust it.

Why this changes decisions

Verified Googlebot activity can support discussions about crawl access, freshness, and site health. Spoofed Googlebot activity belongs in a completely different conversation. Claimed AI crawler traffic may be interesting, but it should not be treated as established fact until you know what that bot family can actually verify.

What a good bot dashboard should do

A good product should not flatten all bot traffic into a single line item. It should show what the request claimed to be, whether that claim is trustworthy, and how your interpretation should change based on that status. That is the shift from “interesting bot counts” to bot data that teams can actually act on.

Bottom line Treat identification and validation as separate layers. Recognition tells you what the request says it is. Verification tells you whether you should believe it.