Googlebot Begins Crawling With HTTP/2 Protocol

Googlebot Begins Crawling With HTTP/2 Protocol

Google updated their Googlebot Developers Support Page to reflect that Google is now able to try downloading pages via the latest HTTP/2 protocol. This is effective November 2020. The Googlebot developer page was updated November 12, 2020 to reflect this change.

This change was previously announced in September. The change is now formally in effect.

According to Google:

“Generally, Googlebot crawls over HTTP/1.1. However, starting November 2020, Googlebot may crawl sites that may benefit from it over HTTP/2 if it’s supported by the site.”

Why HTTP/2 Network Protocol

HTTP/2 is currently the latest network protocol. It allows for faster and more efficient transfer of data between a server and a browser (or Googlebot).

What HTTP/2 does is to reduce the amount of time it takes for a web page to be delivered from a browser to a server. HTTP/2 also reduces overhead by compressing HTTP header fields.

Advertisement

Continue Reading Below

Under the previous network protocol (HTTP/1), multiple streams would have to be downloaded in parallel because only one request at a time was enabled under the old HTTP/1 version.

With HTTP/2, Googlebot and browsers can take advantage of the new “multiplexed” quality. That means multiple resources can be downloaded within one stream from one connection instead of having to request multiple streams from multiple connections to download the same web page.

According to an official IETF FAQ page on Github:

“HTTP/1.x has a problem called “head-of-line blocking,” where effectively only one request can be outstanding on a connection at a time.

…Multiplexing addresses these problems by allowing multiple request and response messages to be in flight at the same time; it’s even possible to intermingle parts of one message with another on the wire.

This, in turn, allows a client to use just one connection per origin to load a page.”

Advertisement

Continue Reading Below

The capabilities of HTTP/2 means less server congestion and saves server resources.

Minimizing the strain on server resources is good for websites. Sometimes, not only Googlebot but many other bots hit a site at the same time.

The result is that the site begins to respond in a sluggish manner because so many server resources are being used. This is bad for users trying to view web pages and bad for the publisher if Googlebot cannot crawl a website because the server is being stretched to the limit by rogue bots like scrapers and hackers.

According to Google:

“…starting November 2020, Googlebot may crawl sites that may benefit from it over HTTP/2 if it’s supported by the site.

This may save computing resources (for example, CPU, RAM) for the site and Googlebot, but otherwise it doesn’t affect indexing or ranking of your site.”

Publishers Can Opt Out of HTTP/2 Crawling

It’s possible to opt out of HTTP/2 crawling. The server must be configured to send a 421 server response code.

The 421 status code is described by the Internet Engineering Task Force (IETF.org) as a Misdirected Request. This means that a request for HTTP/2 is misdirected if it’s not available.

According to the IETF:

“The 421 (Misdirected Request) status code indicates that the request was directed at a server that is not able to produce a response.
This can be sent by a server that is not configured to produce responses for the combination of scheme and authority that are included in the request URI.”

Google’s developer page recommends:

“To opt out from crawling over HTTP/2, instruct the server that’s hosting your site to respond with a 421 HTTP status code when Googlebot attempts to crawl your site over HTTP/2. If that’s not feasible, you -can send a message to the Googlebot team- (however this solution is temporary).”

Advertisement

Continue Reading Below

Citation

Googlebot Developer Page
https://www.google.com/webmasters/tools/googlebot-report

local_offerevent_note November 13, 2020

account_box ezikiel

Leave a Reply

Your email address will not be published. Required fields are marked *