if a webpage blocked by robots.txt and google crawled a webpage on your website while crawling Google Bot encountered or finds that you have blocked specific webpage (url) using robots.txt (Not to index) then google will trigger error indexed thought blocked by robots.txt and if you block URL’s which you don’t want google search to index them then you should not use robots.txt file (Now google indexes pages which are also blocked by robots.txt). Indexed though blocked by robots.txt file error will be seen in new search console only if you have blocked urls by robots.txt file the review the url’s listed in this section and find a different ways to block those urls.
What does indexed though blocked by robots.txt Error means?
In Google search console indexed though blocked by robots.txt file means URL’s of that directory (or anything) which is blocked by robots.txt preventing from crawling but not telling Google Bot or spiders from preventing index in Google (Google made changes to robots.txt directives). That is the reason URL’s which are blocked by robots.txt file will be listed there and it may be huge in number depending up on your website.
How to get rid of indexed though blocked by robots.txt?
In order to get rid of indexed though blocked by robots.txt error easy fix will be creating a new robots.txt and removing those pages from Google search by remove url tool if you no longer need those pages on your website or you have to make sure that Google Bot can crawl those pages and see a no-index meta tag on those pages to prevent indexing from Google.
If you see list of your WebPages under indexed though blocked by robots.txt file its better to understand what exactly Crawling and Indexing means and how Google spiders and Google consider them.
Difference Between crawling and Indexing:
Crawling and indexing are both separate. Crawling is the process of spiders to find new and updated articles of your website and get them indexed and organize information available to other visibly in Google search index and robots.txt is only for crawling.
Indexing is the process of spiders doing the rest of job to get them indexed in search index after crawling process completes. After a page is discovered by Google bot spiders then Google tries to understand those pages and as per their search algorithms Google indexes them as the page is relevant to.
After understanding the difference between indexing and crawling then you will understand why url are listed in search console saying indexed though blocked by robots.txt file.
More Google SEO Article:
What to Do if Indexed Though Blocked by robots.txt file:
Just leave them that’s not going to hurt your website in any means. In this case, new search console is reporting is what’s happening with your website and giving you information to improve your website. They are blocked by robots.txt file and Google Bot respects robots.txt and doesn’t crawl those pages but will be indexed in a scenario when those URL’s could be linked to from other pages on your site in this situation Google will index if they are linked from external url source also. As Google bot follows from link to link pages when a page crawls.
Google Respects Robots.txt file directives
Google fully respects instruction given in robots.txt directives and crawl accordingly but now google made changes to those robots.txt directives now.
Login to search console navigate to robots.txt section -> enter the url which you want to test in robots.txt
In search console navigate to robots.txt section and check live version of robots.txt files and urls and click on submit live version.