[The Marketing Analytics Intersect, by Avinash Kaushik] [1]
TMAI #472: BYE SEO, HELLO AEO! -P5
[Answer Engine Analytics | Methodologies & KPIs]
[Ā Web Version [2]Ā ]
THE PRESENT: Answer Engine Optimization (AEO). Check.
THE FUTURE:Ā Agentic AI Optimization (AAO). Done.
THE DRIVER:Ā Answer Engine Analytics (AEA). Today!
The comprehensive implications of this transformative moment,
_searching_ to _answering_, span your: A. Site experience. B. Digital
Tech Stack. C. 1P & 3P content + distribution strategies. D. Digital
advertising tactics. E. Managing the transition of advertising to
Humans to advertising to Agents.
All of this excitement creates a new need: Data!
At the moment, there is no Webmaster Tools. There are no referring
ākeywords.ā No data from AEs. Our traditional tools like Google
Analytics have limited visibility (in referring strings).
Luckily, we have some early forays into qualitative and quantitative
data āextractedā from LLMs. These tools can guide our strategic
journey though implications A through E above.
Iām going to focus on two tools I rely on for Answer Engine
Analytics (AEA):
*
TRAKKR. [3]Ā Ā It is simple, well crafted, and has a free option. I
want you to start tracking tomorrow. Free makes that easier.
*
EVERTUNE. [4]Ā I started working with Ed when Evertune (ET) was a
spreadsheet. It has blossomed into something smart, layered, and
sophisticated.
[Disclosure: I made a tiny, tiny, tiny angel investment in Evertune in
Oct 2024.]
Iāll cover both below. Through my examples (and nit-picking!),
youāll learn how to evaluate any tool claiming to offer Answer
Engine Analytics (AEA).
4A. ANSWER ENGINE ANALYTICS: METHODOLOGIES & KPIS.
AEA tools are best thought of as competitive intelligence tools ā
inferring behavior in an ecosystem, from the outside.
Because inference can be done in, literally, a thousand different
ways, and this space is so new⦠I want to start by emphasizing four
higher order bits covering how LLMs actually work. These are things I
worry about first when someone throws a tool at me, emphasizing it is
Godās gift to AEA. Each element below has a major implication on
signal quality when it comes to measurement.
ANSWER ENGINE HIGHER ORDER BITS.
The Answer Engines space has a number of _weirdĀ _realities to account
for when you get into Analytics. You should know the impact of each.
A. PROBABILISTIC VS. DETERMINISTIC.
It is critical to internalize that LLMs, Answer Engines, are
_probabilistic_.
Translation: THEY VARY THEIR RESPONSES ON EVERY REPLY, even when the
account, person, ip, all other bits held constant. Thatās because
LLMs are, and this is a massive simplification, _next word prediction
models_. They generate text on the fly. Their responses keep changing.
This is different from traditional Search. You and I could get
different responses to ā_most affordable Asian destinations_,ā but
it was rare for me to get a new set of results from the same
browser/computer/location if I typed that query, say, every two
minutes. A more deterministic approach.
B. LEARNING CORPUS & INFERENCE.
Like Bing & Google, different LLM models LEARNĀ in different ways.
Hence, your brand could rank x in one and xy in another.
LLMs have different data sources, processing, algorithms, weights,
_safetyĀ _measures and so much more.
This means each LLM casts a wide net, to get a partially complete
picture.
C.Ā Ā āTEMPERATURE.ā
To control hallucinations, and how _creativeĀ _the replies are, LLMs
have a _temperatureĀ _setting.
Low temperature = direct answers. High temperature = more creativity.
Temperature is often a behind the scenes setting, though some models
allow you to play with it.
Temperature settings (by the LLM or you) can have a massive impact on
the answers returned (and the data you track).
D. MODEL SIZE/TYPE.
In the last year, one of the coolest developments in LLMs was āDeep
Researchā / āThinkingā. Because we discovered that, unlike the
old Google model of _results as soon as you start typing_ (!!), if we
gave the LLM time to think, the answers got materially better.
I can ask _ācan you recommend a super cool cross-body bag for an NYU
student, with an understated sense of style?ā_ in the āQuickā
mode or in the āThinkingā mode. Each offers a different answer: At
two price points... Foldie Crossbody & Maison de Sabre for _quick_
mode. Bellroy Tokyo Crossbody & Everlane Form Bag in the
_thinking_Ā mode. If I try that query on a small offline model running
on my Z Flip 7 phone, Iāll get a different answer.
Please re-read and truly internalize the implications of the A, B, C,
D. The implications on choosing the optimal analytics tool are
immense.
Leaders want _deterministicĀ _data, with _confidentĀ _analysis. That
is not possible, for now. We have a lot of data; we can be more
informed re whatās going on inside AEs.Ā You need to be comfortable
taking action after combiningĀ _probabilisticĀ _analysis with your
business experience.
The implications of A, B, C, D, could also easily cause paralysis.
That would be unwise.
AEAĀ METHODOLOGY: QUESTIONS TO ASK.
Having deeply internalized the _higher order bitsĀ _above, ask the
following four questions before you choose an AEA tool. It helps if
your noseās _bullshitake detectionĀ _setting is on high.
1. HOW DOES THE PROMPTING WORK?
For Search Engines, Google and Bing just gave us what the customers
were typing (key words/phrases). For Answer Engines, thereās no such
thing ā no one is sharing the prompts the customers are typing. The
AEA tool builders build a prompting engine that attempts to replicate
what customers type, hence the MOST CRITICAL AEA tool feature to
figure outā¦
*
_Do you have to write your own prompts?Ā _
*
_Does the vendor write them for you?Ā _
*
_How do they triangulate what customers are typing?Ā _
*
_How deep do the prompts go (just name of the brand, products,
more)?Ā _
*
_How to the prompts manage brand name variations (misspellings,
abbreviations)?Ā _
*
_How do they identify competitors (fantasy competitors identified by
Brand, or actual competitors in LLMs)?_
Sweat this one. Asking company employees to come up with the prompts
is an awful idea, we donāt understand our customers (trust me). Look
for as much intelligence and automation in the prompting as
possible.
(Evertune uses an Intelligent Prompt Generator, which crafts thousands
of custom prompts for your category, product features, use cases, and
competitor comparisons. This gets you closer to reality in an
environment where you donāt know the inputs.)
Special Question: _How do they cluster prompts into topics (prices,
features, sentiment, etc.)?_ Ex: The best approach for measuring brand
sentiment is to ask the model to write reviews for your brand. A poor
approach would be to ask āw_hat are the best handbags,_ā it
wonāt get you the full picture of what the model thinks about your
brand.
2. HOW DOES THE TOOL ENSURE INSIGHTS ARE MEANINGFUL?
Items C and D above combine to create the challenge that every answer
to the same prompt might be different. You want to ensure the
āanswers analysisā is statistically significant, and able to
separate the real vs. random from the LLMs.
(Evertuneās use of thousands of prompts to get a distribution of
answers for meaningful insights. Then, the tool uses ā_dynamic
sampling_ā to ensure that you uncover statistically valid patterns.)
Special Question_: How is the AEA tool accounting for AI SLOP puked
out by all LLMs?_ By know you all know AEs puke out tokens that are
real sounding⦠Until you actually read them!
(Evertune deliberately uses a statistical led approach to address the
AIās tendency to spew tokens.)
3. HOW DOES THE TOOL TRIANGULATE ITS INSIGHTS?
Remember, we donāt actually have access to the prompts being typed
by humans.
As company employees, you can write thousands of prompts for what your
customers might type, and not come within a hundred miles of what they
actually type. The tool will face the same problem when (if) it is
writing thousands of prompts.
Ask them: _How are you triangulating the prompts to be closer to what
the Brandās actual customer are typing?_
(Before prompting and getting data, Evertune does three things to
triangulate: A. It collects first-party data from tons of consumer
Apps to model the average userās experience. B. It has built a panel
of 25 million Americans to understand _what_ they are searching for,
language, frequency, and _how_ they are searching. C. They use the
LLMs direct API access to identify the modelās baseline behavior vs.
what they see in the consumer panel. At the end of A, B, C, they gain
the ability to stuff their Intelligent Prompting Mechanism with real
customer behavior from the mobile app and customer panel.)
4. HOW MUCH DATA PUKING DO YOU GET VS.Ā _INSIGHTS_?
Data = What.
Insights = Why.
Actions = Why turned into Profits.
A lot of AEA tools Iām seeing are just puking a lot of _what_, often
with questionable methodologies. This looks impressive in sales demos.
It does not take too long for the realization to dawn that empty
calories are not good for your health.
Ask the AEA tool vendor:
*
_How do you derive the insights you recommend?Ā _
*
_What assumptions and biases go into them?Ā _
*
_How much analysis, segmentation, does your tool allow?_
(Evertuneās latest iterations have a new cluster of reports with
insights re how to change your content strategy to change your
_citationsĀ _rate, or improve your AI Brand Index score. Lots and lots
more work to come into this space. Ex: I want much richer, directive,
insights re how to turbocharge my 1P and 3P content strategies.)
As a paying Subscriber of TMAI (merci!), you know: Methodologies slay
metrics.*Ā Before you start using a tool, spend time asking the
questions above. Pick the vendor who answers simply, and points out
other glaring bits they are unable to measure. That should build
confidence.
* Slay as Millennials say it, not as Gen Z say it.Ā š
ANSWER ENGINE ANALYTICS: SUCCESS KPIS.
In my Answer Engine Analytics work, Iāve found the below metrics to
be super productive. Iāve been able to apply them to _Why_ and
identify profitable actions. You will find them in different tools.
This space is evolving at a rapid clip. A new foundational model seems
to drop every other week! In six months, I might discover additional
metrics that are worthy. As a Premium Subscriber, youāll hear of
them first.
1. VISIBILITY SCORE.
Iām really excited about this one because there was never anything
like it in the old Search world.
Visibility Score is a close cousin of _Unaided Brand Awareness_.
It measures _the percentage of times your brand is returned by the
model when the userās prompt did not include the brandās name._
UNAIDED: _Which handbag do you recommend for a teen girl heading to
university in a hot climate?_
AIDED: _Which Kate Spade handbag do you recommend for a teen girl
heading to university in a hot climate?_
Visibility Score measures the first one, which I appreciate as it a
harder problem for a Brand to solve, and the impact is immensely
resilient. Hereās how it looks like in ETā¦
[Evertune: Visibility Score.]
Coach has pretty good Visibility Scores. Seeing variations is of
values, ex MK is high on Gemini but low on ChatGPT. LVās is crushing
both ChatGPT and Gemini (but their average score is getting hosed by
their low Visibility in Meta AI - they are a 24!).
TRAKKR has a metric called PRESENCE SCORE, a close cousin of
Visibility Score in Evertune. From the Help docs, it is unclear if it
is in response to an aided or unaided prompts.
2. AVERAGE POSITION.
It is a close cousin of a metric in old Google blue links experience -
also called Avg. Position.
It measures the average position of your brand in the Answer Engine
response...
[Evertune: Average Position.]
On Gemini the brand Coach appears in the Average Position of 7, on
ChatGPT it is 5.8.
The higher, the better.
Putting Visibility Score and Average Position together is insightful.
Coach has a very high Visibility Score (hurray!), but a poor Average
Position (dang!). This insight highlights the urgency of focusing on
earning influence (via 3P influence āĀ see TMAI #470 [5]).
Slightly confusing⦠TRAKKR also reports position using a metric
called Visibility Score. It is computed using: 1st place = 10 points,
2nd place = 9 points⦠10th place = 1 point. Total points are
presented as an indexed score on a 100 scale. In a super clever trick,
Trakkr applies square root scaling to make the score differences more
meaningful and to prevent artificial inflation.
[Trakkr: Position Score.]
3. AI BRAND SCORE.
AI Brand Score was created to make things easier for our Extremely
Senior Leaders. It is a compound metric created from the combination
of Visibility Score and Average Position.
An AI Brand Score of 100 means you are in the 1st position, 100% of
the time.
In Evertune, subsequent positions, decays your visibility by 10%. So,
being in 2nd position has 10% less weight. 3rd position, 10% less
weight than being on 2nd. _Yada, yada, yada._
[Evertune: AI Brand Score.]
The AI Brand Score does a better job of calculating an attention
curve, because Average Position can be misleading. Each of the
thousands of prompts shows a wide range of potential positions ā for
the same brand on the same prompt! (Review A, C, D above.)
As a Marketer, realizing that Evertune is measuring the _unaided brand
awarenessĀ _of an LLM/Answer Engine, AI Brand Score does something
cooler.
At Tapestry, under the stewardship of ourĀ CGO Sandeep Seth [6],
weāve boldly invested in transformative brand marketing. The
positive impact of that on humans is visible in our current revenue
and profits (which are public [7]). Now, the AI Brand Score will help
us see the impact of all that brand marketing on LLMs! _How do the
ārobotsā think of us?_
In the short-term, a higher AI Brand Score will ensure the AEs return
us, our products, as the answer more often.
In the long-term, as Agentic AI takes over shopping from Humans
(reviewĀ TMAI #471 [8]), the ārobotāsā internalization of our
brand marketing will ensure that Agents buy from us because the
special magic of our brand transcends price.Ā š
ANSWER ENGINE ANALYTICS: DIAGNOSTIC METRICS.
In additional to the big three KPIs above, there are a clump of
diagnostic metrics (review TMAI #448 [9]) that help me identify
insights, and convert them into actions.
1. AI EDUCATION SCORE.
To increase your Brandās influence with LLMs, you are going to have
to execute a new and improved 3P INFLUENCE STRATEGY (review TMAI #470
[5]). For that, you will need to know which third-party domains are
relevant and useful for your company/products/services.
AI Education Score rates domains by how much they might help you get
your content into models. It is a compound metric (10 point scale),
calculated by looking at whether the domain permits crawling,
relevance of the domain to the category, and the propensity for the
domain to be cited.
PR, Affiliate, Earned Media teams this score is your new BFF.
Caution: Identifying if a URL is included in a citation or source can
be misleading, as there are domains that influence the model itself
but are never cited. Ex: We know Instagram posts impact Meta AI, but
Instagram is rarely included as a source. Hence, the more
sophisticated multi-dimensional approach above by Evertune.
2. BRAND SHARE OF VOICE.
bSOV is calculated by taking the top 50 domains, estimates how many
pages are about the product category, and then what % of those pages
include the brand (our band).
It is super useful in sense checking your relative volume vs.
competitors on most important domains.
Ex: Bag Vanity has an AI Education Score of 10.0 in our category.
Coachās bSOV on it is 6.7%, LV is 10.2% and Prada is 7.5%. On Marie
Claire, Coach is 1.5%, Prada is 12.1%.
See⦠Actionable.Ā š
3. MENTION SHARE.
Another helpful 3P content distribution diagnostic metric.
The Sources report identifies which domains/URLs are specifically
included in the answers that the LLMs are providing to user
_resolution questions._
Evertune computes how often a URL (say Bag Vanity, Marie Claire) is
mentioned as a source in the answer provided by the LLM.
Mention Share is simply your percentage of total mentions.
Ex: For Coach, for the URL Marie Claire, the Mention Share is 1.7%.
But. In ET I can segment the data. When the focus is Price Coachās
Mention Share jumps up to 3.3%.
ANSWER ENGINE ANALYTICS: SEGMENTATION CAPABILITIES.
_All data in aggregate is crap_. ā Me.Ā Long time ago. [10]
I appreciate the ability to segment the data by the Success KPI Iām
interested in. I can do that for AI Brand Score and Visibility Score
below.
It is also helpful to segment by model. The data below is for ChatGPT,
but I can segment and view any other model (I am surprised by just how
many people use Deepseek!).
[Evertune: AI Brand Score Segmented, Deepseek.]
I can pull up data for any of my many, many competitors.
The paid version of Trakkr also has segmentation capabilities.
Hereās the segmented sentiment analysis reports, as an exampleā¦
[Trakkr: Sentiment by Segment.]
As you explore the tool it is worth remembering: Every tool will have
massive data puking capabilities, few will have Success KPIs and
Diagnostic Metrics that are actually useful, and only the rare will
have deep segmentation capabilities.
Overvalue that last one.
NEXT WEEK.
Iāll share my favorites reports that have higher likelihood of
delivering actionable insights.
Iāll do so using a step-by-step process Iāve perfected (for now!):
*
Read the WORDCLOUD REPORT. Find the words that are small that you want
to be big.
*
Use CONSUMER PREFERENCES REPORT to identify your strengths and
weaknesses.
*
Use the AI EDUCATION BRIEF REPORT to write content to build on your
strengths and combat weaknesses.
[Special Note: Donāt try and _bullshitakeĀ _the LLM/AE. If you have
weaknesses that are legitimate, no amount of your 1P and 3P content
will work ā the LLMs have way, way, way, more sources than you can
get to. It is always better to actually fix a weakness across your
products & business.]
*
Use the CONTENT ANALYTICS REPORT to publish your content on the
domains with high AI Education Scores.
*
Monitor the impact of all this work using the OVERVIEW REPORT, and AI
Brand Score KPI.
*
Win.
Lots of screenshots, lots of details, lots to get you from 0 to 900
mph in xy seconds!
BOTTOM LINE.
In a space that is evolving by the day, it is critical to hitch your
ride to the very best methodology available. Of the tools Iāve
analyzed, 95% die right here. It is so easy to identify no matter how
pretty the reports, the methodology cannot stand up to the tiniest
poking.
Then, as was the case with SEO & Webmaster tools, it is critical to
separate the KPIs from the Metrics from the Vanity⦠Ensure that the
ones you do pick can passĀ t [11]he _three layers of the So What
test_. [11]
Carpe diem.
-Avinash.
Thank you for being a TMAI Premium subscriber - and helping raise
money for charity.
Your Premium subscription covers one person. It's fine to forward
occasionally. Please do not forward or pipe it into Slack for the
whole company. We have group plans, just email me.
[Subscribe [12]] Ā |Ā [Web Version [2]] Ā |Ā [Unsubscribe [13]]
[14]
[15]
[16]
©2022 ZQ Insights  |  PO Box 10193, San Jose, CA 95157, USA
Links:
------
[1] link
[2] link
[3] link
[4] link
[5] link
[6] link
[7] link
[8] link
[9] link
[10] link
[11] link
[12] link
[13] link
[14] link
[15] link
[16] link