Google won’t comment on a potentially massive leak of its search algorithm documentation (2024)

Google’s search algorithm is perhaps the most consequential system on the internet, dictating what sites live and die and what content on the web looks like. But how exactly Google ranks websites has long been a mystery, pieced together by journalists, researchers, and people working in search engine optimization.

Now, an explosive leak that purports to show thousands of pages of internal documents appears to offer an unprecedented look under the hood of how Search works — and suggests that Google hasn’t been entirely truthful about it for years. So far, Google hasn’t responded to multiple requests for comment on the legitimacy of the documents.

Rand Fishkin, who worked in SEO for more than a decade, says a source shared 2,500 pages of documents with him with the hopes that reporting on the leak would counter the “lies” that Google employees had shared about how the search algorithm works. The documents outline Google’s search API and break down what information is available to employees, according to Fishkin.

The details shared by Fishkin are dense and technical, likely more legible to developers and SEO experts than the layperson. The contents of the leak are also not necessarily proof that Google uses the specific data and signals it mentions for search rankings. Rather, the leak outlines what data Google collects from webpages, sites, and searchers and offers indirect hints to SEO experts about what Google seems to care about, as SEO expert Mike King wrote in his overview of the documents.

The leaked documents touch on topics like what kind of data Google collects and uses, which sites Google elevates for sensitive topics like elections, how Google handles small websites, and more. Some information in the documents appears to be in conflict with public statements by Google representatives, according to Fishkin and King.

“‘Lied’ is harsh, but it’s the only accurate word to use here,” King writes. “While I don’t necessarily fault Google’s public representatives for protecting their proprietary information, I do take issue with their efforts to actively discredit people in the marketing, tech, and journalism worlds who have presented reproducible discoveries.”

Google has not responded to The Verge’s requests for comment regarding the documents, including a direct request to refute their legitimacy. Fishkin told The Verge in an email that the company has not disputed the veracity of the leak, but that an employee asked him to change some language in the post regarding how an event was characterized.

Google’s secretive search algorithm has birthed an entire industry of marketers who closely follow Google’s public guidance and execute it for millions of companies around the world. The pervasive, often annoying tactics have led to a general narrative that Google Search results are getting worse, crowded with junk that website operators feel required to produce to have their sites seen. In response to The Verge’s past reporting on the SEO-driven tactics, Google representatives often fall back to a familiar defense: that’s not what the Google guidelines say.

But some details in the leaked documents call into question the accuracy of Google’s public statements regarding how Search works.

One example cited by Fishkin and King is whether Google Chrome data is used in ranking at all. Google representatives have repeatedly indicated that it doesn’t use Chrome data to rank pages, but Chrome is specifically mentioned in sections about how websites appear in Search. In the screenshot below, which I captured as an example, the links appearing below the main vogue.com URL may be created in part using Chrome data, according to the documents.

Google won’t comment on a potentially massive leak of its search algorithm documentation (1)

Google won’t comment on a potentially massive leak of its search algorithm documentation (2)

Image: Google

Another question raised is what role, if any, E-E-A-T plays in ranking. E-E-A-T stands for experience, expertise, authoritativeness, and trustworthiness, a Google metric used to evaluate the quality of results. Google representatives have previously said E-E-A-T isn’t a ranking factor. Fishkin notes that he hasn’t found much in the documents mentioning E-E-A-T by name.

King, however, detailed how Google appears to collect author data from a page and has a field for whether an entity on the page is the author. A portion of the documents shared by King reads that the field was “mainly developed and tuned for news articles... but is also populated for other content (e.g., scientific articles).” Though this doesn’t confirm that bylines are an explicit ranking metric, it does show that Google is at least keeping track of this attribute. Google representatives have previously insisted that author bylines are something website owners should do for readers, not Google, because it doesn’t impact rankings.

Though the documents aren’t exactly a smoking gun, they provide a deep, unfiltered look at a tightly guarded black box system. The US government’s antitrust case against Google — which revolves around Search — has also led to internal documentation becoming public, offering further insights into how the company’s main product works.

Google’s general caginess on how Search works has led to websites looking the same as SEO marketers try to outsmart Google based on hints the company offers. Fishkin also calls out the publications credulously propping up Google’s public claims as truth without much further analysis.

“Historically, some of the search industry’s loudest voices and most prolific publishers have been happy to uncritically repeat Google’s public statements. They write headlines like ‘Google says XYZ is true,’ rather than ‘Google Claims XYZ; Evidence Suggests Otherwise,’” Fishkin writes. “Please, do better. If this leak and the DOJ trial can create just one change, I hope this is it.”

Google won’t comment on a potentially massive leak of its search algorithm documentation (2024)

FAQs

How complicated is Google search algorithm? ›

The Google Search algorithm is a complex system Google uses to decide how pages will rank in the search results. The algorithm is believed to consider hundreds of factors. Content relevance, quality, and the user experience (UX) are among the most important ones (more on each of these below).

What is the most well known algorithm that Google has ever utilized? ›

Currently, PageRank is not the only algorithm used by Google to order search results, but it is the first algorithm that was used by the company, and it is the best known.

What is the name of the algorithm invented by Google for choosing the most relevant results for a search? ›

One of the most known and influential algorithms for computing the relevance of web pages is the Page Rank algorithm used by the Google search engine. It was invented by Larry Page and Sergey Brin while they were graduate students at Stanford, and it became a Google trademark in 1998.

What is true about Google's search engine algorithm that it uses to return search results? ›

Google uses a complex algorithm called PageRank to determine search results. This algorithm takes into account several factors including: Relevance: Google assesses the relevance of a webpage to the user's search query.

Is Google the #1 search engine in the world? ›

Google. Google is the most popular search engine in the world. Capturing nearly 92 percent of the search market, it's no wonder why SEO specialists seek out any available piece of information about Google's ranking algorithm.

Which search algorithm is the most powerful? ›

This type of searching algorithm is used to find the position of a specific value contained in a sorted array. The binary search algorithm works on the principle of divide and conquer and it is considered the best searching algorithm because it's faster to run.

What is the latest Google algorithm update in 2024? ›

Google March 2024 core update officially completed

The core algorithm update, which was announced on March 5, lasted until April 19, taking a total of 45 days to roll out. This update was one of the largest and most complex updates ever released by Google due to the multiple core systems involved.

What is Google's latest algorithm? ›

On September 14, 2023, Google initiated the rollout of its latest algorithm update, dubbed the “September 2023 Helpful Content Update.” This update introduces an “improved classifier” and is expected to take approximately two weeks to fully implement.

What was the most used search engine before Google? ›

Yahoo! Yahoo was one of the first major search engines on the internet, and during the period of 1998-2002, it was one of the most popular websites in the world. The company was founded in 1995 by Jerry Yang and David Filo, and it quickly grew to become a major player in the internet search market.

Who owns the patent to Google's original search algorithm? ›

According to Quara user Tom McFarlane, "The invention was made by Larry Page while he was a graduate student at Stanford University. As a result, the patent rights were assigned to Stanford. It was Stanford that applied for and was granted the patent. Google licensed the rights from Stanford."

How to rank higher on Google? ›

  1. Step #1: Improve Your On-Site SEO. ...
  2. Step #2: Add LSI Keywords To Your Page. ...
  3. Step #3: Monitor Your Technical SEO. ...
  4. Step #4: Match Your Content to Search Intent. ...
  5. Step #5: Reduce Your Bounce Rate. ...
  6. Step #6: Find Even More Keywords to Target. ...
  7. Step #7: Publish Insanely High-Quality Content. ...
  8. Step #8: Build Backlinks to Your Site.
May 15, 2024

Does Google still use PageRank? ›

Here's What It Is & How They Use It. Patrick Stox is a Product Advisor, Technical SEO, & Brand Ambassador at Ahrefs. He was the lead author for the SEO chapter of the 2021 Web Almanac and a reviewer for the 2022 SEO chapter.

What was the former name for Google? ›

They called this search engine Backrub. Soon after, Backrub was renamed Google (phew).

How many times has my name been googled? ›

Wondering how many times your name has been Googled? Unfortunately, there's no way to find out: Google Search, like other search engines, doesn't disclose individual search data. Any websites or companies claiming they can reveal an exact number are, ultimately, lying—some data brokers claim to do exactly this.

What is the Google Panda algorithm? ›

Google Panda is a search algorithm update introduced in February 2011. Panda's goal was to reduce the number of low-quality websites on search engine results pages (SERPs). It was one of Google's earliest updates aimed at controlling content quality.

Is Google's search algorithm a secret? ›

Google's internal documents have been leaked on GitHub, revealing secret details about the company's search engine algorithms. The leaked documents contain data about factors influencing search results, which are key to digital marketing and search engine optimization efforts.

What is the complexity of the search algorithm? ›

It has a time complexity of O(log n), where n is the number of elements in the list. This means that the performance of the search is logarithmic with respect to the size of the list, and as the list grows larger, the time it takes to find an element remains relatively constant.

What algorithm is used in Google Search? ›

PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results.

How many lines of code is the Google Search algorithm? ›

Google's entire code base is two billion lines.

Top Articles
Latest Posts
Article information

Author: Dong Thiel

Last Updated:

Views: 6190

Rating: 4.9 / 5 (79 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Dong Thiel

Birthday: 2001-07-14

Address: 2865 Kasha Unions, West Corrinne, AK 05708-1071

Phone: +3512198379449

Job: Design Planner

Hobby: Graffiti, Foreign language learning, Gambling, Metalworking, Rowing, Sculling, Sewing

Introduction: My name is Dong Thiel, I am a brainy, happy, tasty, lively, splendid, talented, cooperative person who loves writing and wants to share my knowledge and understanding with you.