How Google API Engagement Data Points Impact SEO

A Google Leaks Deep Dive

Written for & first appeared on Sisters in SEO – quoted by Chima Mmeje for Moz.

The Google API Leak has revealed several engagement data points used as ranking signals. In total, the leak includes 2,569 documents with 14,014 attributes and features related to the API.

So what started all of this?

Well, an anonymous source (later revealed) took the leak to Rand Fishkin, who called on Mike King to help authenticate the documents. They each wrote their analysis, and the SEO industry went wild.

Ultimately Google confirmed the authenticity of the leaked documents to The Verge. Google didn’t intend for this to be leaked…or did they?! Regardless of intent, it should be noted that the leak does seem to corroborate some of the information from DOJ evidence, Google white papers, and patent applications.

As someone who likes reading Google’s patents, I take the Google API Leak as validation & an opportunity to geek out a little, nothing more, nothing less.

TL;DR:
Engagement data, including clickstream information and user interactions, plays a significant role in determining the relevance and is quite likely a subset of ranking signals for search results. This raises important questions about data collection and transparency, necessitating clear communication from Google to maintain user trust.Stefanie Morris

Google’s How Search Works documentation says, “Search algorithms look at many factors and signals.” So what are signals? The search query you used, the relevance and usability of related pages, expertise associated with the query, your location & other settings. Google’s documentation goes on to say, “The weight applied to each factor varies depending on the nature of your query.”

A signal here is something that can be used in the algorithms and systems that make calculations to help determine rankings.

This is accomplished by using signals to create data, broadly called ‘aggregated and anonymized interaction data’. With that data, Google is able to “assess whether search results are relevant to queries.” This data is then used to create a signal to the machine learning system to “to better estimate relevance” of the search results.

Disclaimer: Sisters in SEO often refers and links to websites, tools, apps, and other content that can help improve your skills and build your business. Sometimes, we receive compensation if readers sign up or make a purchase. A sister’s gotta eat!

Key Engagement Data Attributes

The leak mentions several engagement data points, or Attributes, that could contribute to Google’s ranking signals. Traditional engagement data attributes include user engagement duration, bounce rate, pages per session, CTR, and social interactions. In this blog post, I’m abstracting some of those so we can discuss specific Attributes from the API documentation.

The documentation of ContentAttributions, NavBoost, Clickstream Data, User Experiences & Personalization, and Page Titles all impact user engagement data. Although the weight and current use of these features is speculative, the Attributes mentioned provide us with valuable insights.

ContentAttributions

This stores information about the original source of content, which can influence how content is perceived and shared, potentially affecting user interactions and time spent on a page.

“The following protobuf is used to store an attribution from one page to (usually) one other page, giving credit for the content. This information is used during ranking to promote the attributed page. This protobuf is copied from a quality_contra::SelectedAttribution. See //quality/contra/authorship/attribution and https://qwiki.corp.google.com/display/Q/ContentTrackingContentAttribution.”

Attributes

freshdocsOutgoing (type: list(GoogleApi.ContentWarehouse.V1.Model.ContentAttributionsOutgoingAttribution.t), default: nil) – Selected outgoing attributions extracted on FreshDocs.
offlineOutgoing (type: list(GoogleApi.ContentWarehouse.V1.Model.ContentAttributionsOutgoingAttribution.t), default: nil) – Selected outgoing attributions extracted via offline MR jobs.
onlineOutgoing (type: list(GoogleApi.ContentWarehouse.V1.Model.ContentAttributionsOutgoingAttribution.t), default: nil) – Selected outgoing attributions extracted online on Alexandria.

Navboost

One of the most important ranking factors based on click data from Google Chrome. Pandu Nayak, VP of Search at Google was noted as saying “Navboost is important…plenty of other signals are also important…” in the DOJ trial transcript.

Giphy downsized large - stefanie morris co

Google examines clicks and engagement on searches both during and after the main query (referred to as a “NavBoost query”).

NavBoost Click Attributes

ResearchScienceSearchNavboostQueryInfo

“The information representing one navboost query for the dataset source_url.”

Attributes

impCount (type: number(), default: nil) – imp_count stores an estimate of the number of impressions for this tuple.
lccCount (type: number(), default: nil) – lcc_count stores an estimate of the number of long clicks for this tuple. NOTE: It is similar to query_doc_count, but calculated in different manner.
query (type: String.t, default: nil) – The query string.
queryCount (type: number(), default: nil) – The query_count stores the counts on this query.
queryDocCount (type: number(), default: nil) – The query_doc_count stores the number of long-clicks on this pair.

QualityNavboostCrapsCrapsClickSignals

Based on click data, particularly from Google Chrome. Indicates high user engagement with search results.

“Click / impression signals for craps. The tag numbers are the same as they were in the original CrapsData (below). This is deliberate.”

Attributes

absoluteImpressions (type: float(), default: nil) – Thus far this field is only used for host level unsquashed impressions. When compressed (e.g., in perdocdata.proto, CompressedQualitySignals), this value is represented individually and thus is generally incompatible with the other values which are compressed as click-ratios.
badClicks (type: float(), default: nil) –
clicks (type: float(), default: nil) –
goodClicks (type: float(), default: nil) –
impressions (type: float(), default: nil) –
lastLongestClicks (type: float(), default: nil) –
unicornClicks (type: float(), default: nil) – The subset of clicks that are associated with an event from a Unicorn user.
unsquashedClicks (type: float(), default: nil) – This is not being populated for the current format – instead two instances of CrapsClickSignals (squashed/unsquashed) are used. We are migrating to the new format where this field will be populated.
unsquashedImpressions (type: float(), default: nil) – This is not being populated for the current format – instead two instances of CrapsClickSignals (squashed/unsquashed) are used. We are migrating to the new format where this field will be populated.
unsquashedLastLongestClicks (type: float(), default: nil) –

This is also reflected in: IndexingSignalAggregatorSccSignal.

QualityNavboostGlueVoterTokenBitmapMessage

“Used for aggregating query unique voter_token during merging. We use 4 uint64(s) as a 256-bit bitmap to aggregate distinct voter_tokens in Glue model pipeline. Number of elements should always be either 0 or 4. As an optimization, we store the voter_token as a single uint64 if only one bit is set. See quality/navboost/speedy_glue/util/voter_token_bitmap.h for the class that manages operations on these bitmaps.”

Attributes

subRange (type: list(String.t), default: nil) –
voterToken (type: String.t, default: nil) –

NavBoost Location Attributes

Geo-fencing has long been a popular topic in PPC-circles. We know that location attributes from click data that accounts for country, state/providence, etc and cross references against device type mobile vs desktop is used in NavBoost. If the region or user-agent is missing some of that information, the query result may be processed universally.

There are many golden-eggs regarding local SEO in the API Leak documentation for GeostoreFeatureProto.

And in spite of there being a different team working on Google Ads than Google Search, I think SEO experts would be remise to not reference Google Ads API, particularly for audience & location attributes, if they’re doing their own research.

Clickstream Data

Clickstream data is used to determine user engagement and influence organic rankings.

This data acts as a roadmap of a user’s online activity, capturing digital breadcrumbs that reveal which websites a user visited, the pages they viewed on each site, the time spent on each page, and their subsequent clicks.

CompressedQualitySignals

“Google utilizes cookie history, logged-in Chrome data, and pattern detection (referred to in the leak as “unsquashed” clicks versus “squashed” clicks) as effective means for fighting manual & automated click spam.”

QualitySitemapTargetGroup

Attribute

topUrl (type: list(GoogleApi.ContentWarehouse.V1.Model.QualitySitemapTopURL.t), default: nil) – A list of top urls with highest two_level_score, i.e., chrome_trans_clicks.

Google api leak chrome clickstream data - stefanie morris co

👀 chrome_trans_clicks…okay Google, we see you.

User Experience

IndexingMobileInterstitialsProDesktopInterstitials

“Concerns how interstitials are managed. Since interstitials can negatively impact user engagement by providing a poor user experience, this attribute’s usage in ranking could prioritize pages that balance monetization and user experience.”

Attribute

isGoodForMobile (type: boolean(), default: nil) –

QualityNavboostCrapsCrapsClickSignals

Attribute

unicornClicks (type: float(), default: nil) – The subset of clicks that are associated with an event from a Unicorn user.
lowQuality (type: integer(), default: nil) – S2V low quality score: converted from quality_nsr.NsrData, applied in Qstar. See quality_nsr::util::ConvertNsrDataToLowQuality.

Personalization

There’s much that can be said about personalization, particularly when it comes to Clickstream Data. This attribute highlights some of the granularity that can be used to create signals:

Attribute

type (type: String.t, default: nil) – The type of the event. The type depends on the OtherKeyword.source. OUTLOOK source fields must be one of: billing_information directory_server keyword mileage sensitivity user subject All other fields are treated as a CUSTOM source field. The value can be free form or one of these predefined values: home other work

PersonalizationSettingsApiProtoLocalDiscoveryOpaRecipesContext

“LINT.IfChange Contexts regarding the preferences from OPA_RECIPES. For example, users can click a recipes and say they don’t like one cuisine. OpaRecipesContext will contain the doc_id/url of that recipes.”

RepositoryWebrefPersonalizationContextOutputs

“Details about personalization and contextual scoring decisions from Personalized Query Understanding (go/pqu). This message represents information about what kind of biasing was applied, including what type of data were used and how strongly. Intended to be used by client code for fine-tuning necessary ranking or triggering logic if it’s not possible to rely on the aggregated annotation confidence alone. To minimize unwanted dependencies and incorrect usage of the data this proto has restricted visibility. Please reach out to refx-pqu@google.com if you want to have access.”

Attributes

outputs (type: list(GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefPersonalizationContextOutput.t), default: nil) – Detailed output scores per personalization type.

Page Titles

Page titles influence site ranking beyond individual pages. They’re critical for conveying the relevance of content to both users and search engines.

ScienceIndexSignal documentation highlights these two important Attributes:

HtmlTitleFp (type: String.t, default: nil) – Fingerprint of the html title of the page. This is useful for checking if we have the same version of the page as websearch.
Title (type: String.t, default: nil) – Title of the article. Its only filled in when the html title of the page isn’t good.

DocProperties mentions the following attributes related to page titles:

avgTermWeight (type: integer(), default: nil) – The average weighted font size of a term in the doc body
badTitle (type: boolean(), default: nil) – Missing or meaningless title
badtitleinfo (type: list(GoogleApi.ContentWarehouse.V1.Model.DocPropertiesBadTitleInfo.t), default: nil) –
languages (type: list(integer()), default: nil) – A Language enum value. See: go/language-enum
leadingtext (type: GoogleApi.ContentWarehouse.V1.Model.SnippetsLeadingtextLeadingTextInfo.t, default: nil) – Leading text information generated by google3/quality/snippets/leadingtext/leadingtext-detector.cc

Use of Engagement Data in Search Rankings

Engagement data, including clickstream information and user interactions, plays a significant role in determining the relevance and is quite likely a subset of ranking signals for search results. This raises important questions about data collection and transparency, necessitating clear communication from Google to maintain user trust.

Eric Lehman, who worked on the Google search quality team for 17 years, was questioned by the DOJ in the Google Antitrust case. He made two important distinctions regarding the use of click data: there’s training data and then there’s user data.

Kevin Indig summarized it like this:

“Training data = Google uses click data to train systems like BERT / Rankbrain, etc. and launches changes during (core) algo updates.”
“User data = Google measures who clicks on what in real time and adjusts ranks for all or just some users (personalization).”

Gary Illyes is on record saying that Google uses clicks to display SERP Features, so we know that Google has been using click data. The recent Google API Leak highlights topUrl as an attribute within GoogleApi.ContentWarehouse.V1.Model.QualitySitemapTargetGroup:

topUrl (type: list(GoogleApi.ContentWarehouse.V1.Model.QualitySitemapTopURL.t), default: nil) – A list of top urls with highest two_level_score, i.e., chrome_trans_clicks.

Chrome Clickstream Data

Clickstream data serves as a comprehensive map of a user’s online journey, detailing which websites they visit, the pages they explore, the time spent on each page, and their subsequent clicks.

This information helps Google understand user behavior and preferences, influencing the ranking of search results. The data is used to train algorithms like BERT and RankBrain, which refine search results based on real-time user interactions.

Chrome Security

David Strom wrote “Google’s Web Environment Integrity project raises a lot of concerns” for the Silicon Angle in July of 2023 highlighting growing concerns around Chrome a data security. He goes on to say, “Remember the problems with web cookies? WEI takes this to a new level.

At its heart, WEI has some lofty goals of trying to combat browser fingerprinting abuses. This is a technique, also called HTML canvas fingerprinting, that uses a variety of tracking techniques to identify a user’s browser by cataloging the particular app, IP address, computer processor and operating system and other characteristics. The combination can be used to determine, for example, if I return to a particular website and deliver customized online ads and personalized content, which is sometimes creepy.

These fingerprints have been around for many years and began their life as part of the HTML v5 specifications. They are a very rich and detailed look at the inner workings of a user’s computer, and the data is collected automatically and without the user’s explicit approval with any web server.”

@nearcyan tweeted, “Google has already started forcing Web Integrity into Chromium despite it being a ‘proposal’ With WEI, users can be denied access for using non-approved browsers or hardware The open Internet is officially dead as soon as this is commonly implemented”

Ad 4nxfjlh1ie gvv3goom 17q5pcqjdywmwh 9jcs8axadqeolyztbpm snf0d7zzsqja2aev3btffsb4jnhipxgzukpgy479hhdhspn215ndfxsw msb hubfnur1bhbgrbz9mqsqj9ruwq8rqqrx1kgp - stefanie morris co

Navboost and User Clicks

Navboost, one of the prominent ranking factors, is closely tied to Chrome clickstream data. It leverages user engagement metrics to assess the quality and relevance of content.

QualityNavboostCrapsCrapsClickSignals gives a good overview of Attributes related to clicks:

absoluteImpressions (type: float(), default: nil) – Thus far this field is only used for host level unsquashed impressions. When compressed (e.g., in perdocdata.proto, CompressedQualitySignals), this value is represented individually and thus is generally incompatible with the other values which are compressed as click-ratios.
badClicks (type: float(), default: nil) –
clicks (type: float(), default: nil) –
goodClicks (type: float(), default: nil) –
impressions (type: float(), default: nil) –
lastLongestClicks (type: float(), default: nil) –
unicornClicks (type: float(), default: nil) – The subset of clicks that are associated with an event from a Unicorn user.
unsquashedClicks (type: float(), default: nil) – This is not being populated for the current format – instead two instances of CrapsClickSignals (squashed/unsquashed) are used. We are migrating to the new format where this field will be populated.
unsquashedImpressions (type: float(), default: nil) – This is not being populated for the current format – instead two instances of CrapsClickSignals (squashed/unsquashed) are used. We are migrating to the new format where this field will be populated.
unsquashedLastLongestClicks (type: float(), default: nil) –

Quality Rater Feedback and Mentions

Google’s algorithms also incorporate feedback from quality raters and mentions across the web. Quality rater feedback, accessible through the search API, directly impacts rankings by validating the relevance and quality of content. It would also seem that mentions of entities, such as personal names or company names, function similarly to backlinks, enhancing the authority and ranking of web pages.

Check out the Search Quality Evaluator Guidelines for further reading.

Pogo-Sticking and User Satisfaction

Google’s algorithms analyze user behavior to identify dissatisfaction with search results. Pogo-sticking, where users quickly return to the search results page after clicking on a result, signals that the initial page did not meet their needs. High pogo-sticking rates indicate low-quality content, prompting Google to adjust rankings accordingly.

Google’s ranking algorithms, while unknown as a whole, are based on a combination of factors. What we do know is that Google continuously updates its algorithms, often focusing on improving user engagement and satisfaction. Attributes that enhance user engagement are likely still in use but may have evolved in how they are applied or combined with new signals.

Implications for SEO Strategies

The weight of engagement data in Google’s ranking algorithms underscores the importance of user satisfaction. SEO strategies should focus on creating high-quality, relevant content that engages users and fulfills their search intent. Key considerations include:

Improving User Experience: Ensuring that users find the information they seek quickly and efficiently, reducing pogo-sticking rates.
Monitoring Engagement Metrics: Regularly analyzing clickstream data and user interactions to identify areas for improvement.
Optimizing Page Titles and Mentions: Crafting compelling page titles and earning mentions across reputable sources to enhance authority and rankings.

Current Use and Future Trends

While the leak validates the use of these engagement metrics, it remains uncertain how their application may have evolved. Google’s algorithms are continuously updated, often with a focus on enhancing user engagement and satisfaction. We intended to continue aligning evolving user behavior and search trends to optimize user experience and engagement.

How Google API Engagement Data Points Impact SEO

Table of Contents

Table of Contents

Categories