Amazon Q “Skipping seed url xxxxx as it does not match filtering criteria.” error.


I’ve recently been playing with Amazon Q but ran across an issue when trying to index some web sites.

Our company, like many, has multiple information sources and with our wide footprint of API connections and services it is difficult to stay across the details. AWS Summit 2024 in London piqued my interest in Amazon Q. Initially, I crawled our public web site and the results were promising. Next up, I tried our wiki and project management sites. They require authorisation and after a few issues (mainly typos on my part), the syncs were completing successfully. However, the sync was not pulling any data. Cloudwatch logs showed the error as:

Skipping seed url https://sub.ourdomain.com as it does not match filtering criteria.

My first thoughts were that the inclusion / exclusion filters I had set was the problem, but removing them made no difference. Next I looked at the auth but that was ok too.

A web search drew a blank, but eventually I found my way back to the docs. The lesson here is to read the docs, especially the sections marked ‘Important’ and which are marked with a big red triangle…. I had not configured our robots.txt file and the Amazon Q crawler respects this and returns nothing.

Hardly the clearest error and as this did not turn up elsewhere in my searches, I hope this saves someone else some time!


Leave a Reply

Your email address will not be published. Required fields are marked *