Home ยป SEO ยป How to Check How Google Sees Your Website

How to Check How Google Sees Your Website

Table of Contents
    Add a header to begin generating the table of contents
    Scroll to Top

    Google strives to provide accurate results for all search queries, currently delivering roughly 8 billion searches daily, with over 90% of the current global market of search engines. In order to obtain a full set of the resources available as a set of results, they have created a collection of information that spans the entire internet. This is referred to as the โ€˜Google index,โ€™ and it is something that Google has continually updated and refined since their founding.

    Upon the publication of a new page on the internet, Google will detect and โ€˜crawlโ€™ this page. This means that it will render all visual and text elements and coordinate it into a set of code that is more digestible for their purposes. Your website is a part of the index and each public page should be crawled, but the amount of information Google will collect on you will vary by site and by page. There are various methods of monitoring your status in the index, as well as means of improving what is collected so that it is as accurate as possible.

    (And shout-out to Craig Whitney and Michela Leopizzi for the help researching and editing this post!)

    How to Check if My Website is Indexed by Google


    There are many ways to monitor the crawl status of your page โ€“ or collection of pages, or even website from the perspective of a search engine.

    Some of these will require various levels of access and site ownership โ€“ but some of the methods listed below are accessible to anyone on the web.ย 

    The URL Inspection Tool

    The URL Inspection Tool is found inside of Google Search Console. Search Console is a must-have for any site owner and digital marketing professional working with a website.

    URL Inspection allows you to input a URL and returns virtually everything you need to know about a pageโ€™s crawl and indexation status in Google.

    As you can see in the screenshot, URL Inspection can provide you with just some of the following details:

    • Is the page on Google?
    • Was the URL discovered in an XML sitemap?
    • When was the page last crawled?
    • Does the page have mobile issues?
    • Is the page eligible for additional SERP enhancements (Breadcrumbs, Logo, Sitelinks searchbox, etc.)?

    Pro tip: use the URL Inspection API! If you want to check the crawl status of your entire website, you can combine tools like Screaming Frog and URL Inspection to check the crawl status and indexation status of every page on your website. Keep in mind that there are API limits, though. Googleโ€™s Search Console URL Inspection API limits users to 2,000 queries per day โ€“ which may present a problem for larger websites.

    If you really want to see how Google is seeing your website, youโ€™ll want to use the โ€œView Crawled Pageโ€ function within the URL Inspection details.

    This allows you to see the crawled HTML of the page. In other words: this allows you to see the exact HTML that Google was able to see and crawl in order to โ€œreadโ€ your webpage.

    This is incredibly helpful for ensuring that all of the main elements and content on the site is โ€œreadableโ€ by Google โ€“ and that Google is seeing your website and webpage exactly how you intended.

    View the Source Code

    If you arenโ€™t able to use the URL Inspection tool, you can still get some of the information I just mentioned.

    Viewing the source code doesnโ€™t tell us exactly what Google is seeing. But as we all have come to learn: Google is a pretty smart search engine. Itโ€™s usually a safe bet that if you can find it in the source code (ensure itโ€™s not just a comment and commented out!), Googleโ€™s crawler (Googlebot) will be able to crawl and read your page in the same way.

    Youโ€™ll need some experience in reading HTML to know what youโ€™re looking for. But if you do have experience in HTML (even at a basic level), you should be able to search the source code and ensure all of the main elements are active, readable parts of the page.

    You can use view source by adding โ€œview-source:โ€ in front of your URL. This will open up a new browser tab (if youโ€™re using Chrome) with the HTML of the page.

    Chrome Developer Tools

    Want to see the source code and interact with it in real time? You can do that using Chrome Developer Tools.

    If youโ€™re using Google Chrome as your browser, youโ€™ll simply right-click and click on the โ€œInspectionโ€ item in the menu. This allows you to do many things, including:

    • Inspect the page and interact with individual elements
    • See active CSS details for respective HTML elements
    • Check for errors on the network tab
    • Check page status code on the network tab

    Search Operators

    Another alternative to see how Google is viewing your website is to use various search operators. For those unfamiliar with search operators, a search operator is a specific string used within Google to narrow your results.

    The most helpful operators for seeing how Google sees your website are:

    • โ€œsite:โ€ + โ€œcomโ€ โ€“ this will help you see how many pages from your website Google has indexed (mydomain is a placeholder here).
    • โ€œcache:โ€ + โ€œURLโ€ โ€“ this will allow you to view Googleโ€™s cache of a specific The cache is effectively a snapshot of the rendered page as Google saw it on the respective crawl date (found in the summary text at the top of the page).

    How to Affect the Google Crawl

    Now that we know how to check how Google is viewing your website and your webpages โ€“ how can you impact the way Google is crawling and interacting with your site? Here are a few common methods:

    Meta Tags

    Meta tags are directives inserted into the HTML of a webpage that give Google and other webcrawlers specific direction on how to crawl it.

    There are two prominent meta tags that I want to mention here: โ€œnoindexโ€ and โ€œnofollow.โ€

    Noindex provides search engines with the directive that they can crawl the URL, but the page should not be included in its index. In other words: you can crawl the page, but do not make it eligible for search results. This is commonly used for landings pages that are apart of non-SEO marketing campaigns (social campaigns, Paid Search, etc.).

    Nofollow provides search engines with the suggestion* that they should not follow any links. This can be issued at a page-level or an individual link level.

    *Note: there is a very specific reason I did not use โ€œdirectiveโ€ and used suggestion instead. Google had historically treated โ€œnofollowโ€ as a directive, but in 2019 started treating โ€œnofollowโ€ as a โ€œhintโ€ instead.

    The URL Inspection Tool

    Notice a common theme yet? URL Inspection is another valuable tool, particularly when it comes to aiming to influence Googleโ€™s crawler.

    Using URL Inspection, it is possible to submit a URL to Googlebotโ€™s crawling queue. This does not guarantee the page gets crawled immediately, but it does give you some influence.

    You can find this feature by clicking โ€œRequest Indexingโ€ in the URL Inspection tool.

    ย 

    Common Crawling Issues

    Ideally, your website โ€“ and its individual pages โ€“ will be easy for Google to crawl, render, and index.

    Unfortunately, this isnโ€™t always going to be the case. Itโ€™s not only important to know how to investigate how Google is going to see your page, itโ€™s important to understand the common crawling issues Google faces for a troublesome page. Here are a few common issues to look for in your diagnostic process.

    Loading Errors

    Loading errors are a common crawling issue for web crawlers and indexation engines alike. 404 (Page not found) errors โ€“ as well as other 4xx and 5xx status codes โ€“ bring web crawlers to โ€œdead endsโ€ on your websites. URLs with these status codes are also non-indexable, making them ineligible for search results.

    • Potential solution: clean up broken links on your website and ensure all pages are returning 200 (OK) status code.

    Dynamically Loaded Content

    Dynamically loaded content is often generated via JavaScript. This type of content can be fine for Googlebot, but it can lead to problems if the JavaScript fires after initial page load. This could lead to Googlebot not โ€œreadingโ€ the dynamic content when it first loads the page.

    • Potential solution: aim to use static content where possible. If youโ€™re unable, use URL Inspection to ensure Google is โ€œreadingโ€ your dynamically-loaded content.
    • Additional pro-tip: if youโ€™re unfamiliar with JavaScript and its impact on your pages, you can also use a free tool like What Would JavaScript Do to see what elements of your page are generated via JavaScript.

    Site Speed and Crawl Budget

    Site speed is a commonly known metric and is widely thought of to be a user behavior metric. The faster a page loads, the better the experience, right?

    Thatโ€™s certainly true, but site speed also impacts a webcrawlerโ€™s ability crawl (and index) a website. We like to think Google has unlimited capability, but it does work within a โ€œcrawl budgetโ€ for websites to ensure that itโ€™s not overloading any given website in its attempt to crawl and index a site.

    Itโ€™s important to note โ€œcrawl budgetโ€ isnโ€™t a fixed number for a website โ€“ a siteโ€™s crawl budget is largely dictated by the siteโ€™s ability to effectively manage Googlebot without overwhelming the site or server response time.

    • Potential solution: optimize site speed as much as possible. Strategies such as caching and CDNs could be potential solutions to improve site speed when delivering large files.

    Linked and Large Files

    Google doesnโ€™t just crawl webpages โ€“ it also crawls any additional resources like PDFs your site may be linking to. While Google does crawl these relatively well, just remember that these files are also contributing factors to your websiteโ€™s crawl budget (as described above).

    • Potential solution: we recommend keeping your important content on HTML pages to ensure Google can (1) efficiently crawl these pages, and (2) ensure smaller load times for Googlebot.

    ย 

    How to Resolve Crawling Issues

    If youโ€™re experiencing crawling issues, there are a variety of methods you can use to resolve issues. Here are just some of the more common issues โ€“ and how to resolve them.

    Correct Poorly Optimized Content

    If your content is not well optimized, there is the potential that Google will choose to ignore or not index it properly. This includes issues like:

    • Duplicate content (internal and external)
    • Hidden content
    • Excessive keyword manipulation, including keyword stuffing and inauthentic language
    • Link overuse/natural anchor link usage

    In short: ensure your content and pages are natural and abide by Googleโ€™s best practices for writing content for your target audience.

    Resolve Unintended Blocking Meta Tags

    Itโ€™s always important to check the effected page for any conflicts with the meta tags referenced above, especially the noindex directive. If the noindex directive is found on the page, Google will not index the page and make it eligible for search results.

    Optimization

    There are multiple ways to encourage Google to find your webpages (and your content!), including several on-page optimizations and off-page optimizations.

    Some high-level on-page optimizations include:

    Some off-site optimizations include:

    The State of Google Crawls

    Google is regularly providing updates on its webcrawler, Googlebot. And more importantly, itโ€™s constantly tweaking its ranking algorithm to serve higher-quality results for a userโ€™s search.

    Itโ€™s important to keep up with these updates to ensure Google is not unintentionally blocked from reading your website or webpages. Need help following these updates? Contact a member of our SEO team for the latest SEO news and strategies to ensure your website stays visible on Google.

    Launch your new site with confidence.

    Getting ready to launch a new version of your site, but haunted by the horror stories of lost traffic and tanking rankings?

    Our FREE SEO Site Transition guide can lead you safely and successfully through the process.ย 

    Get it in your inbox now.

    About the Author

    Operations Manager

    Jared has a passion for combining data with SEO strategy to uncover new opportunities and unlock potential. Read More ยป
    Share This Article
    Facebook
    Twitter
    Pinterest
    LinkedIn
    Discussion

    2 Comments

    • thank you for the article. is there a way to know the google crawl budget? or is it a black box?

      Reply
      • Hi Mark – sorry for the late response!

        It’s a bit of a black box — BUT! The best advice I can give you would be to check your web logs to get a better sense for how often Googlebot is pinging your website. Using these files, you should be able to estimate, on average, how often Google crawls pages on your website (unique daily/monthly URLs crawled).

        Cheers!

        – Jared

        Reply

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Related Content

    About FourFront

    FourFront uses data to provide digital marketing and market research services. In our blog, our team of analysts, strategists, and engineers provides tips, insights, analysis, and commentary.

    Keep In Touch

    Learn about new articles by following us on social:
    Scroll to Top

    Sign Up for Updates

    Get regular updates about what’s happening at FourFront!

    Enter your full name and email to be in the know about all things SEO, data solutions, and much more.

    Submit a Request