The Problem

Why Search Term Classification is Hard

A typical Google Ads account generates 20,000-100,000 unique search terms per month. You need to understand what people are searching for to optimize bids, find negatives, and allocate budget wisely.

But how do you classify tens of thousands of terms by intent? The obvious solutions don't work.

Why Not Just Use an LLM for Everything?

The first instinct: "AI can understand intent, so let's send every search term to Claude/GPT/Gemini."

This fails in production for four reasons:

$300+ per account

50,000 terms × $0.006 per call = $300 in API costs. Multiply by your client count.

15+ minutes per run

Even with parallel processing, thousands of API calls take time. Not practical for daily analysis.

Wasteful for obvious terms

Paying to classify 'buy nike shoes near me' when a rule could do it free. Or terms with 1 impression that you'll never see again.

Doesn't learn patterns

If the LLM classifies 'buy running shoes' as high intent, it won't help you classify 'buy hiking boots'.

Real example: An account with 50,000 search terms would cost $300 per classification run using GPT-4o nano. If you're running this monthly across 20 client accounts, that's $6,000/month in API costs alone - before you've improved a single campaign.

Why Not Just Use Rules?

The opposite extreme: "We'll just write rules. If the term contains 'buy' or 'price', it's high intent."

This works for obvious cases but breaks down quickly:

Misses nuance

Rules can't understand that 'best running shoes for marathon' is high intent even without the word 'buy'.

Constant maintenance

Every new product category or signal word requires manual rule updates.

No learning

Can't adapt to account-specific patterns or emerging search behavior.

Brittle at scale

Companies end up with hundreds of rules that still misclassify 30-40% of terms.

The result: Companies using pure rule-based systems end up with hundreds of brittle rules that still misclassify 30-40% of terms. The maintenance burden is enormous, and it can't adapt to new products or search behavior.

The Solution: Hybrid Classification

Use rules first for what they're good at (exact matches, clear signals), then use the LLM strategically on a small subset of high-volume ambiguous terms, then propagate those learnings to similar terms using machine learning.

~50%
of spend classified by LLM
High-volume ambiguous terms
~45%
of spend via LLM + ML propagation
Similar patterns learned from LLM
~5%
of spend via rules only
Low volume, clear signals, brands

Key insight: The 1,000 highest-volume ambiguous terms represent 40-50% of all impressions. Classify those with AI, propagate the patterns, and you've accurately classified 95%+ of your spend for ~$0.02.

Why These Intent Categories?

Each category represents a different bidding strategy or action. You can't optimize what you can't measure.

Brand

Your brand terms need separate bidding, tracking, and ROAS targets. Critical to identify correctly.

Navigational

Competitor searches have different intent than your brand - may indicate comparison shopping.

High Intent

Purchase signals ('buy', 'price', 'near me') indicate bottom-of-funnel readiness. Bid aggressively.

Medium Intent

Product browsing without clear signals. The largest category - needs volume to convert.

Low Intent

Research queries ('how to', 'guide') rarely convert directly. Build audiences, not sales.

Negative

Job searches, Reddit discussions, DIY content - WILL waste budget if not excluded.

Low Volume

Bottom 5% by impressions - insufficient data for confident classification or optimization.

Non-Latin

Different script indicates wrong geography or language targeting - needs investigation.

How the Pipeline Works

The sequence is deliberate: free methods first, expensive methods on what's left. Each step reduces the workload for subsequent steps.

Step 1-4: Quick Wins (Free)

First, we knock out the easy stuff. Non-Latin characters? Flagged instantly. Bottom 5% by volume? Marked as low-volume. Your brand strings? Matched. Competitor brands? Categorized as navigational.

Result: ~15-20% of terms classified in milliseconds with perfect accuracy.

Step 5: Cache Lookup (Free)

Check if we've seen this term before and classified it with the LLM. On subsequent runs, cache hit rates are 60-80%.

Why this matters: The expensive work (LLM classification) gets reused forever. Run monthly, and most terms are already classified.

Step 6-7: Signal Detection + Similarity (Free)

TRUE intent signals only - 'buy', 'price', 'near me' for high intent. 'how to', 'guide' for low intent. Then Levenshtein distance catches brand typos ('nikee', 'nkie').

Result: Another 30-40% of terms classified by patterns, zero API cost.

Step 8: LLM Classification (~$0.02)

Now the magic happens. We take the TOP 1,000 unclassified terms by impressions and send them to Gemini Flash. These are the ambiguous, high-volume terms that rules can't handle.

Why 1,000? They represent 40-50% of total impressions. This is where spend is happening.

Step 9: ML Propagation (Free)

TF-IDF, n-grams, word patterns, KNN similarity. If the LLM classified 'buy running shoes' as high intent, we learn that pattern and apply it to 'buy hiking boots', 'buy trail shoes', etc.

Result: 1,000 LLM classifications propagate to 3,000-5,000 similar terms. This is why the approach scales.

Step 10: Default Assignment (Free)

Anything left gets medium_intent. These are product searches without clear signals - a conservative default.

The Math That Makes It Work

Example: 50,000 search terms in an account

Step 1-4:
~15,000 terms classified by rules (brand, competitor, low volume, non-Latin)
Step 5:
~20,000 terms found in cache from previous runs
Step 6-7:
~10,000 terms classified by intent signals and similarity
Remaining:
~5,000 ambiguous terms
Step 8:
Top 1,000 by impressions sent to LLM → $0.02 cost
Step 9:
~3,500 terms classified by ML propagation (learned from LLM results)
Step 10:
~500 remaining → medium_intent default

Hybrid Approach

Cost
$0.02
Speed
25 sec
Accuracy
95%+

Pure LLM

Cost
$300
Speed
15+ min
Accuracy
98%

Pure Rules

Cost
$0
Speed
<1 sec
Accuracy
60-70%

You can build this yourself. The search term classifier is available as a skill in the 8020brain repository. It's production-ready code that you can customize with account-specific brand strings, competitor lists, and custom rules.

For many agencies, search term analysis is a time-consuming manual task that gets done monthly (if at all). With this approach, you can automate 99% of the work, run it daily, and focus on the strategic decisions instead of categorizing thousands of terms by hand.

Want to Build This Kind of Solution?

Join Ads to AI - where I teach Google Ads professionals how to build practical AI automations like this search term classifier.

Learn hybrid AI approaches that work in production
Build cost-effective solutions for real client problems
Get access to skills, scripts, and implementation guides
Brain visualization