Qwant Web crawler

Introduction

Qwant uses web crawlers to enhance its index and provide the best possible service. This page gives information about how they work and their behaviour on your websites.

User-agent

While crawling, we announce ourselves with different user-agents depending on the version of our crawler. Something that will always appear in our user-agents that you can use to identify our web crawlers is the name: Qwantbot.

Our user-agents are defined as:

Mozilla/5.0 (compatible; Qwantbot{-news}/X.Y_{worker_id}; +https://help.qwant.com/bot/)

⚠️ Note: String between {} are optional

Here are a few examples of user-agents we might crawl you with:

Mozilla/5.0 (compatible; Qwantbot/1.0_12345; +https://help.qwant.com/bot/)
Mozilla/5.0 (compatible; Qwantbot-news/2.0;  +https://help.qwant.com/bot/)

Robots.txt

The crawler respects the robots rules standard described at https://www.robotstxt.org/orig.html

Verifying Crawler

Reverse & Forward DNS Lookup

To check if a web crawler accessing your server is from Qwant, perform a reverse DNS lookup and verify that it resolves to a name ending with “qwant.com”.

Optionally you can do a forward DNS lookup using the name in previous step to confirm that it resolves back to the same IP.

For example, on Linux you can use the “host” command:

> host 91.242.162.1 
1.162.242.91.in-addr.arpa domain name pointer qwantbot-1-162-242-91.qwant.com. 
> host qwantbot-1-162-242-91.qwant.com 
qwantbot-1-162-242-91.qwant.com has address 91.242.162.1

Using IP ranges

First method is the preferred one.

Alternatively, you can identify our bots by matching the remote IP address of the HTTP request against this json file: qwantbot.json.

Refresh this list on a daily basis as it can change any time.

Troubles

If something went wrong when we visited your website using our crawlers, we are sincerely sorry for the inconvenience.

Please report us any problem caused by the crawlers by sending an email to qwantbot@qwant.com