What are Excluded pages in Google New Search Console?
Excluded pages are the pages which Google excluded them from indexing in Google search index and the status of excluded pages or URLs are considered as top level reports and the status of each page of a website are grouped by a specific reason and should concentrate on resolving and fixing this issue of excluded pages in Google search console.
These pages are typically not indexed, and we think that is appropriate. These pages are either duplicates of indexed pages, or blocked from indexing by some mechanism on your site, or otherwise not indexed for a reason that we think is not an error.
URLs will also be listed with the detailed Excluded pages are also be a part of intermediate stage of indexing process also.
Why Pages are excluded in Google Search Console:
URLs or Pages with status of Excluded in search console fall in this category are not indexed in Google search for a few set of reasons and few of the reasons are listed below:
Status and Types of Excluded Pages in Search Console:
Alternative Page with Proper Canonical Tag:
Alternate page often refers to a duplication of pages which Google considered or spotted when crawling seen exact canonical tag and pointing to that canonical. Pages flagged or seen as alternative page with proper canonical tag is what Google choose and there is nothing a webmaster do if URL is pointing to the correct canonical tag.
If you have AMP pages then its normal to see lot of AMP pages seen as alternative page with proper canonical tag because AMP pages will behaving the same canonical tag pointing to the website or NON-AMP canonical tag.
Crawl Anomaly is the error deals with the response code when Google bot trying to fetch a webpage. If Google bot sees 4xx or 5xx errors then the website pages fetch will be failed and URL or pages will flagged as crawl anomaly.
If you see your WebPages under crawl anomaly fetch failed then do a fetch as Google and do a live test of your webpage in new search console and check whether your webpage is returning 200 ok response then its ok that’s normal, instead of 4xx or 5xx level of errors.
Crawled Currently Not indexed:
Pages which are successfully crawled by Google bot not they are not currently indexed this status of excluded number of URLs will change from time to time, URL or pages will be indexed once the URLs meet the guidelines of Google.
Discovered currently Not Indexed:
The status of exclude as discovered and currently not indexed means that these pages are successfully discovered by Google but these pages are not crawled yet. This status in excluded pages as discovered currently not indexed flags when your sitemap is overloaded or Google bot tried to crawl your website sitemap but due to site overload Google bot backed up without crawling will be flagged and status of the URLs will be discovered currently not indexed.
This is the main reason why last crawled for few URLs will be not available in excluded pages in Google search console.
Excluded by No-Index Tag:
When Google bot successfully crawled the webpage or URLs of your website and sees a meta tag of No-index no follow or a header request of x-robots tag and will not be indexed as Google bot will not index page if it sees no index tag on a webpage.
Not Found (404):
When Google bot sees a 404 error which is Not found these URLs will be listed in this not found. Google search index and 404 index are separate index the website 404 not found will not harm your search rankings. If the URL 404 is not the status to be returned or then do a 301 redirect to the current URL to get the URL index by having a permanent redirection.
Pages with Redirect:
If your webpage have a 301 permanent or redirection then it will not be indexed as per Google documentation.
Soft 404 error is when Google bot saw a webpage of status 200 ok response but it was expected to return a 404, then URLs will be flagged as soft404.
Blocked By No-Index Tag:
Excluded pages in Google search console also includes page which are blocked by no index directive instructions and URLs are blocked by No index tag. If you want these pages to get index in Google search console then remove the no index tag and validate it.
Blocked By Page Removal Tool:
If your URLs are blocked by removal tool saying or asking Google to remove the URL from the Google search index then those URLs or pages will be in excluded pages in Google search console.
Blocked By Robots.txt:
Blocked by robots txt error in excluded pages in search console are the pages which are blocked to crawl a specific page in robots.txt file or htaccess file saying to Google not to crawl then these pages will not be crawled but they will be indexed in Google because crawling and indexing are two different process done by Google bot.
Blocked due to unauthorized request (401):
When Google bot tried to fetch the URLs or pages and encounters with 401 then pages will be unauthorized request so it will not crawl. If you do want the pages then remove the unauthorized 401 response.
Duplicate page without Canonical Tag:
This indicates that Google bot sees the page has a duplication and thinks this is not the canonical of the specified page.
Duplicate non-HTML page:
Excluded errors in search console as duplicate non-html page comes in to play when Google bot has something to do with PDF files which is not a html page and only canonical URL will be shown in Google search index.
Google Choose different canonical than User:
If Google thinks that the there are couple set of pages but Google thinks that another canonical will be a better choice to show in Google search as per the guidelines of Google then these page URL will be flagged as Google choose different canonical than user to serve good and better results to user.
Page Removed Because of Legal complaint:
This is straight saying page removed because of legal complaint.
Queued for Crawling:
Queued for crawling in excluded pages in search console saying Google bot queued the pages for the next crawl check back later and status may vary.
Submitted URL Dropped:
Page submitted to Google for indexing and dropped from indexing due to unspecified reasons.
Submitted URL Not Selected By Canonical:
This is duplication and WebPages submitted to Google and the URL is one of a set of duplicate URLs without an explicitly marked. Google did not index this URL. Instead, we indexed the canonical that we selected and Google thinks another URL is the better canonical.