Google Webmaster Tools (hereafter GWT) provides a number of free online instruments for web site management. It is one of several applications accessed through Google Accounts (see references).
It is important to businesses today for their prospective customers to be able to find them on the internet via the search engines. Google is the most popular search engine at the moment. I wanted to learn to use GWT to better understand site indexing and to make my sites search friendly, specifically Google-friendly. I began by signing up for a free Account with an e-mail address and password.
The first stop after log-in is the Dashboard. Here we find a chart listing the sites I have elected to manage with GWT. (At the moment I have two sites listed here.) The chart tells the number of sitemaps that I have submitted for each site, and whether each site has been verified.
I select the site whose information I wish to examine. Clicking on the site name brings up the Summary page for the folder. At top are tabs for the other folders: , and . Each folder has a selection of tools. Throughout the GWT pages are links to download the data tables, but when I tried to do so, it resulted in an Internet Explorer error. Another annoyance is the security popup that each page of information triggers.
On each page are links to Terms of Service, the Privacy Policy, Webmaster Central which leads to tools, discussion forums **, FAQ links, etc.), and Google's blog for webmasters (which seems to be by Google staff, rather than the subscribing webmasters).
Website Diagnostics
Indexing information is provided on the Summary page: date of the most recent home page crawl by the Googlebot, and whether pages from the site are or are not included in Google's index.
There is also a table of web crawl errors: HTTP errors, Not found, URLs not followed, URLs restricted by robots.txt, URLs timed out, and Unreachable URLs. If there are any errors, details are provided. for example, if the Googlebot could not locate some of the pages listed in the sitemap, there will be a link to a list of those urls. The errant url may have been listed incorrectly in the site map, or the page may have been down when the 'bot visited.
This page also has some links to some Google help center articles.
Also available in the Diagnostic folder:
Crawl errors:
Web crawl and Mobile web
Tools:
robots.txt analysis,
Manage site verification, Crawl rate, Preferred domain, Enhanced Image Search, and URL Removals
Web crawl lists URLs from the site that Googlebot had trouble crawling. You can select a date range to check. Mobile web reports Mobile CHTML and Mobile WML/XHTML crawl errors.
The robots.txt analysis page shows the default text of the cached robots.txt file, if any, which blocks searchbots' access to specific URLs. Reasons to block such access might include guarding your e-mail link from spammer 'bots, or speeding up the indexing process. There is a box on the page to test your urls against current and prospective robots.txt files.
Site Verification with GWT
A site can be verified either of two ways: by adding a Google metatag to the home page, or by uploading a blank html file with a name Google provides.
The Crawl rate page shows recent Googlebot activity: min/ave/max number of pages crawled per day, number of kilobytes downloaded per day, time spent downloading a page (in milliseconds), and current crawl speed. Here you can also set a faster, normal or slower speed.
Preferred domain lets you choose whether or not to specify that the site's URLs display with/without www in the index.
Enhanced Image Search is a new, experimental feature. It connects you with a group of users who will review each other's images and select names for them.
URL Removals - To remove content from the Google index, you are instructed to first set requests for the page return an HTTP status code of either 404 or 410, block the page using a robots.txt file or a meta noindex tag. Your content will then be removed from the index the next time Google crawls the site. To expedite your content removal, after using one of the methods listed above, this page offers an automated New Removal Request tool.
Making Sense of Google Statistics
Google s tatistics has four pages: Crawl stats, Query stats, Page analysis and Index stats
Crawl stats - These statistics provide distribution information for pages we have crawled for the google.com index. It shows whether most pages of the site have high/medium/low/no PageRank, and which page has ranked the highest relative PageRank had the highest recently.
Query stats - has two columns: Top search queries lists the search terms that most often returned pages from the site. Top search query clicks are the top queries that actually directed traffic to the site. These lists give the highest position any page ranked in search for each query.
You can elect to view all searches or just web searches; also you can view the searches from all locations or choose one of several specific geographic regions. For example, you can look at the top query and click search strings from web searches by users in the UK. I was a little disappointed to see China was not among the locations listed, as the company I work for has potential customers in China.
All data is supposed to be averaged over the last 7 days, but I noticed my query stats have not been updated in several weeks. It might be a helpful tool if they would keep it up to date.
Page analysis - Phrases, Keywords and Content. These three statistics tables show how the Googlebot sees the site: Phrases in external links to the site (11 in the case of the site I am now checking); Keywords in the site's content (100) , and in external links to the site (20) ; and Content: distribution of page types (e.g. text/html, application/pdf) and encoding.
Index stats - claim to show how your site is indexed. There are links to some samples of index stats, although the results are incomplete: indexed pages in the site; pages that link to the site's front page; the current cache of ther site; information Google has about the site, and pages on the internet that look similar to Google (similar keywords, etc.)
From postings in the Google Webmaster Help - Discussions forums, I gather that I am not the only one who found the incomplete and out-of-date statistics results more confusing than helpful. Knowing that Google considers incoming links in its ranking decisions, I was alarmed to see our list of links to the site's front page dwindling. The inks were disappearing off of the Google search results page (link:yoururl) pulled up by "pages that link..." I knew those links were still active - was Google not recognizing those links anymore? Had the industry directories in which we were listed somehow gotten in trouble with Google? Was our site going to lose its precarious position in the search results?
A visit to the Webmaster Tools forum enlightened me - SEO oldtimers (and some Google moderators as well) were busy answering several postings per day regarding similarly troubling results in the index stats. The index stats were incomplete and out of date, they reassured anxious posters. We should ignore them and refer to the external links table in the tabbed section. Evidently our links were ok.
The forum is a good place for learning more about useful search optimization methods as well as potential pitfalls. A number of the contributors also share links to their favorite web development resources.
Links
External links - The table on this page tells whick many pages on the site have links pointing to them from other sites, and how many. You can click the number in the External links column to see a table with the links to the page, showing when they were last found by Google. It seems to be updated every few months.
Internal Links - This table provides a list of pages on the site that have links pointing to them from other internal pages. It includes internal links and links from subdomains.
Google XML Sitemaps
The Sitemaps page displays the file name of the sitemap(s) and date submitted, when last downloaded by Google, status, and number of URLs.
Google accepts a variety of sitemaps: a simple text file listing site urls, an xml file, or a map compiled with the Google sitemap generator via a Python script. I find the best tool to use (for larger sites) is http://www.xml-sitemaps.com/
Reference Links
in COIN74A Spring 2007 as a Knowledge Quest Assignment, and augmented for this course. We are both grateful and indebted to Helen for her work on this assignment and help with the lesson.
Copyright © 2008 - 2009 Robert D. Cormia - September 10, 2008