The Official Google Webmaster Central blog has posted steps on how to get reincluded in the search results should you find yourself in the unfortunate circumstance of being exempted. Mariya Moeva, of the Search Quality Team, hosts an entertaining how-to vid explaining the steps you should take when your site is Google-less. For those who can't or don't want to view the video, look below for the steps in text.
1. Check your access. Log into your Webmaster Tools account and check the Overview page to see what happened when Googlebot visited your site last. Also, check your robots.txt file to make sure there aren't any pages blocked that you want seen by Google. 2. Check your messages. There could be a message in your Message Center inbox of your Webmaster Tools account regarding your site. 3. Read the Guidelines. Make sure you know what Google does and does not allow for sites it lists in its search results. 4. Help Group. When all else fails, join the webmasters help group where other webmasters and Googlers can help determine what's going on. 5. Fix your site! Once you know what's wrong, fix your site! 6. Submit a Reconsideration Request. After you've made the fixes, submit a request for Google to check your site again.
Have you ever submitted a reinclusion request? Tell us about your experience in the comments.
Related Reading: Google Updates SEO Recommendations Article
Posted by Nathania Johnson at 10:15 AM | Permalink | Comments (1)
Google has updated its article entitled, "What's an SEO? Does Google recommend working with companies that offer to make my site Google-friendly?" Included in the update are the benefits of SEO as well as guidelines when choosing an SEO company or consultant.
The benefits mentioned in the article are:
Google also offers up 6 questions to ask a potential SEO vendor, but back in March, our own Marty Weintraub posted 48 questions you should consider when signing up for search marketing services. And earlier today, Aaron Shear discussed upsells agencies use to keep clients on board.
When hiring an SEO agency, it's always important to know enough SEO to make sure your vendor is pursuing the best practices. Google's article is a good place to start and of course, stay tuned to Search Engine Watch for news and tips in the SEO industry.
Posted by Nathania Johnson at 11:24 AM | Permalink | Comments (0)
Set the date aside, July 8th, if you want to get the inside scoop on Google's website tools. As announced today over at the Official Google Blog, they will be holding a webinar about Google Analytics, Website Optimizer and Webmaster Tools.
These Google products have become an invaluable set of tools for most serious web owners and marketers and it will be a great opportunity to see if there are any methods one is not using at the moment.
It will be interesting to see what numbers sign up for this and if Google can handle the potentially huge crowd signing up for this event.
Posted by Frank Watson at 3:00 PM | Permalink | Comments (0)
Over at the Google Webmaster Central blog, Search Quality Team member Sven Naumann is tackling the issue of duplicate content. Naumann says there are two primary types of duplicate content, within a domain and cross-domain, and offers up tips in how to deal with each.
Within a Domain
This type of duplicate content is when the content from one page appears on other pages with your site. In this case, most webmasters or site owners usually have a preference as to which page they want to rank. Naumann offers up the following tip, "Include the preferred version of your URLs in your Sitemap file. When encountering different pages with the same content, this may help raise the likelihood of us serving the version you prefer."
Cross-Domain
Cross-domain duplicate content is when content from your site appears on other site, usually through syndication or blogs that scrape content. When it comes to syndication, asking your partners to link back to your page is a good way to help Google know that your site is the original source. As for scraped content, Naumann insists that Google is good at knowing what's scraped and what's real: "You shouldn't be very concerned about seeing negative effects on your site's presence on Google if you notice someone scraping your content."
Still, once in a while, scraped content may rank higher than your page. In such an instance, Naumann suggests the following:
Wrapping up, Naumann assured webmasters and SEOs that, "In the majority of cases, having duplicate content does not have negative effects on your site's presence in the Google index. It simply gets filtered out."
What do you think about this duplicate content post on the Google Webmaster Central blog? Does it line up with your experience in dealing with duplicate content. Share your thoughts in the comments.
Related Reading: Adam Lasnik comments on Spam Complaints and Dupe Content Large Enterprise SEO: Content Development
Posted by Nathania Johnson at 10:12 AM | Permalink | Comments (0)
Over at the Google Webmaster Central blog, Maile Ohye is giving insight into IP Delivery, Geolocation and Cloaking.
IP Delivery is serving up targeted content to users based on their IP address. This is ok as long as you treat Googlebot the same as you would a user with a similar IP address.
Geolocation is serving up content based on a user's cookie data, login info or IP address. Ohye says, "The key is to treat Googlebot as you would a typical user from a similar location, IP range, etc. (i.e. don't treat Googlebot as if it came from its own separate country—that's cloaking)."
Cloaking is not ok because it shows humans and the Googlebot different content. Ohye explains, "If the file that Googlebot sees is not identical to the file that a typical user sees, then you're in a high-risk category. A program such as md5sum or diff can compute a hash to verify that two different files are identical."
To sum, treat Googlebot the same way you would your human visitors!
Ohye also discussed Google News' first click free policy. Basically, news sites with premium content can have those paid pages indexed by Google News if they allow visitors to view their first click on the page free. But any clicks deeper into the site are permitted to display a login or payment request first.
Posted by Nathania Johnson at 9:28 AM | Permalink | Comments (1)
When you have parts of your site that you don't want the search engine spiders to index, you let them know using a document called robots.txt. But for the coding-challenged, creating that document has not always been easy. Thankfully, Google has created a robots.txt generator as part of its Webmaster Tools.
Once your document is created, you can test it with the robots.txt analysis tool. Google points out that not every search engine recognizes robots.txt. They recommend securing truly sensitive material with password protection.
It's a big week for the Google Webmaster team. Today, they're hosting the first ever Google Webmaster chat, which begins at 12pm EST/9am PST. The chat will feature a site clinic and also discuss image optimization.
Posted by Nathania Johnson at 11:04 AM | Permalink
For the first time ever Google will host a worldwide live chat, where everyone will have a chance to hear and see Google Webmaster Central answer your questions.
If you own a blog, a Web site, or just want to move your company into the 21st century, do not miss this call.
Here are the details.
WHEN: Friday, March 28, 9am PDT / noon EDT / 16:00 GMT
Google will pay for the call. It's free.
No strings attached.
The only 4 things you need to join the Google WebEx chat:
1. Phone 2. "Sufficiently-modern" Web browser 3. Internet connection 4. Search Engine Watch Membership
(Okay - just kidding (Google) about #4 - but a Search Engine Watch membership will help you prepare to ask Google the toughest search engine optimization (SEO) and paid search (PPC) questions).
Seriously - all you need is a phone, browser and Internet connection.
The call will be hosted by Google Search Evangelist and SES London speaker, Adam Lasnik, the heir-apparent to Google's one-and-only Matt Cutts.
We know this isn't an early April Fool's Joke from Google because it's signed off by Adam himself - and we quote:
"Talkatively yours, Adam and the English Webmaster Help Guides"
We love the Google English Webmaster Help Guides. They have such cool accents. (Maybe Google will let the American Webmaster Help Guides answer a few questions too.)
Stay tuned for more on this historic Google event. All the techo-geek requirements for the WebEx chat? After the jump.
Google's WebEx Requirements:
Windows 98, 2000, XP, 2003 and Vista
* Internet Explorer 6/7 * Firefox 2 * Mozilla 1.7 or higher * Netscape 8.1 or higher * JavaScript and cookies need to be enabled * Recommend ActiveX be enabled for Internet Explorer * Vista supports Internet Explorer 7 and Firefox 2 browsers only
Mac OS X 10.3, 10.4, 10.5 (PowerPC/Intel) * Safari 1.3 (Mac OS 10.3) * Safari 2.0 (Mac OS 10.4) * Safari 3.0 (Mac OS 10.4,10.5) * Firefox 2 * JavaScript and cookies need to be enabled * Requires Apple Java Runtime Environment (JRE) 5.0 or higher * No support for Remote Access
Solaris 10 (SPARC/x86) * Mozilla 1.7 or higher * Firefox 2.0 or higher * JavaScript and cookies need to be enabled * Requires Sun Java Runtime Environment (JRE) 5.0 or higher * No support for Sales Center and Remote Access
HP-UX 11.11 (PA-RISC) * Mozilla 1.4 or higher * Firefox 1.0 or higher * JavaScript and cookies need to be enabled * Requires Sun Java Runtime Environment (JRE) 5.0 or higher * Only Meeting Center supported
Ubuntu 7.04, Red Hat 4.0, SuSE 10.0 Linux * Firefox 2 * Mozilla 1.7 or higher * JavaScript and cookies need to be enabled * Requires Sun Java Runtime Environment (JRE) 5.0 or higher * No support for Sales Center and Remote Access
AIX 5L 5.3 * Mozilla 1.4 or higher * Firefox 1.0 or higher * JavaScript and cookies need to be enabled * Requires IBM Java Runtime Environment (JRE) 5.0 or higher * Only Meeting Center supported
Posted by Kevin Heisler at 12:33 AM | Permalink
Google Webmaster Central and the Google Search Quality team have a new ally in the global fight against Web spam, cloaking, paid links, link farms and other non-sanctioned schemes: the Security Group at the University of Cambridge.
Web sites created by Chicago-based Privila were banned by Google earlier this month after Steven Murdoch of University of Cambridge Computer Labs exposed an alleged cloaking scheme by the content network. Steven Murdoch blogged about his findings on March 6th. Two days later Google removed the sites from its index.
It seems that people and spiders were seeing different pages when they visited sites such as soccerlove.com, ammancarpets.com, and canadianbattery.com. People were seeing display ads created by unpaid interns. Google spiders were seeing keyword-rich articles also churned out by unpaid interns. Windows Live and Yahoo! were seeing neither ads nor articles.
Privila already has added the articles back to the sites but the sites are not yet re-indexed.
Murdoch discovered the cloaking after the computer lab where he works received a link exchange spam email from Privila.
Posted by Nathania Johnson at 2:53 PM | Permalink
Google has finally added the ability to tell them where your site is located. This much discussed topic involving domain extensions, location of hosting and other factors can finally be specified.
Vanessa Fox spotted it and posted a pic of the interface change here.
Posted by Frank Watson at 1:38 PM | Permalink
The Google Webmaster tools announced new changes to Webmaster Tools. Ths biggest change was the addition of subscriber stats from various Google services, such as: Google Reader, iGoogle, and Orkut.
One of the biggest questions in the comments on the post was about a potential integration with Feedburner itself. This would provide great additional value, and many users seem to want that. This would provide subscriber data on services other than Google services.
In one of the comments, Google employee Susan Moskwa offered some clarification on this point:
"the subscriber number in webmaster tools reflects all known subscribers to feeds that are on your verified domain only. FeedBurner sees all requests for your feeds, whether they're to www.yourdomain.com/feed or feeds.feedburner.com/yourfeed. If a Google user is subscribed to the FeedBurner version of your feed, the webmaster tools number will not include them."In addition to the subscriber stats, the user interface was revamped. The tabbed look is gone, and not there is a left side menu that expands and contracts dynamically, depending on where you go in the site. When I first saw it, it took some getting used to. I suspect, however, that it will ultimately be easier to use, and it probably makes it easier for the menu to be expanded to include more additional features in the future.
Posted by Eric Enge at 10:52 AM | Permalink
Google has added a malware tool to Webmaster Central tools, their blog reported.
"If you find that your site is affected by malware, either through malware-labeled search results or in the summary for your site in Webmaster Tools, we've streamlined the process to review your site and return it malware-label-free in our search results," the blog states.
Posted by Frank Watson at 5:15 PM | Permalink
I don't know if it is the popularity of Matt Cutts (undeniable if you have every watched the comet trail he has behind him at any search event) or a true desire to input ideas for Google's next Webmaster Central feature, but Matt has received over a thousand responses in under 24 hours and the news has not even been fully disseminated yet.
Matt added a poll to his call for feature suggestions and as of this blog entry there were over 1200 votes and 107 comments.
The poll options are: * More information about penalties or other scoring issues * Tools for detecting or reporting duplicate content * Show links on your site that are broken * Score the crawlability or accessibility of pages * Show PageRank numbers instead of none/low/medium/high * Tell Google the correct country or language for a site * Tool to help move from one domain to a new domain * Diagnostic wizard for common site problems * Some type of rank checking * A way to list supplemental result pages * Show causes of 404 errors * Option to "disavow" backlinks from or to a site * Fetch a page as Googlebot to verify correct behavior * Tell Google a parameter doesn't matter * Show pages that don't validate * Ability to show/download all pages from a site (e.g. if your server crashed) * More documentation and examples
* Integrate "Add URL" feature
This is an interesting blog entry and many of the comments are well worth reading.
Posted by Frank Watson at 10:12 AM | Permalink
I did an interview together with Vanessa Fox, just before she left Google. We talked about what was going on with Google Webmaster Tools, features that people have been requesting, duplicate content, and the renaming of the re-inclusion request form to the reconsideration request form.
I was intrigued in particular by the renaming of the re-inclusion request form. Vanessa clearly defined this form as something to be used in the event that you had been violating a webmaster guideline, and that you had fixed it. Here is her exact wording:
That form is really for situations where you have violated the guidelines in some way, and you've fixed it. Then you can use those forms to have someone take a look at it as opposed to just waiting over time for things to naturally pickup again.It's a fascinating statement. It suggests that you can send in an appeal for any penalty that you have become subject too (once you have fixed it that is), and the value of this appeal is not necessarily limited to appealing manually applied penalties. Keeping in mind Google's recent statements that ALL forms submitted with the reconsideration request form within Webmaster Tools (i.e. you are providing your identity as you make the submission) will get looked at, this is a powerful statement. Just be careful to really address the issues before you make the submission
Posted by Eric Enge at 9:00 AM | Permalink
As expected, Google has added a paid links reporting form to its Webmaster Tools. Google has been warring against paid links that pass PageRank, such as those that don't use the nofollow attribute or some form of redirect.
Last month, Google engineer Matt Cutts offered some guidelines for evaluating paid links to see if they fit with Google's guidelines.
Google has not said exactly what it will do with the reports, other than saying, "We'll review each report we get and use this feedback to improve our algorithms and improve our search results. in some cases we may also take individual action on sites."
Posted by Kevin Newcomb at 12:04 PM | Permalink
Google's Webmaster Help group is getting some much-needed backup in the form of additional Google Search Quality team members. The site, manned by a team of Googlers led by Vanessa Fox and Adam Lasnik, will soon see more team members publicly answering questions on the group, according to a recent post by Lasnik.
The team has also been undergoing training to "bring this Google Group to the next level," including "online communications workshops, training in spotting issues, and even (eeek!) a review of important legal issues," Lasnik wrote.
Posted by Kevin Newcomb at 8:46 AM | Permalink
Google made separate announcements today upgrading its recently released anchor text reporting tools, and renaming Froogle to Google Product Search.
The new anchor text tools feature the return of the report on most common individual words in anchor text, an expansion of the number of phrases to 200 and of common words in anchor text to 100. It's also made the feature available to more webmasters.
Posted by Kevin Newcomb at 10:19 PM | Permalink
Stefanie of the Google Search Quality Team in Dublin posted yesterday an Update on SPAM Reporting. She detailes what Google does with the spam reports. Of note is the statement that: "Currently, we investigate every spam report from a registered user". A registered user means someone who files their report using the forms within Webmaster Central (you have to be logged in, so they know who files the report). You can file a report without being logged in, but there is no guarantee that it will be looked at.
There are two things that this tells us:
Of course, this is good news for those webmasters who adhere strictly to the Webmaster Guidelines. After all, it gives them a chance to level the playing field.
Posted by Eric Enge at 11:01 AM | Permalink
The instructions for how to set up a site targeted CPC campaign have been added to Google's FAQs. Discussion at various sites has begun.
Anyone else receive the invitation to test this yet? If so please share the info with us.
Posted by Frank Watson at 9:52 AM | Permalink
Google has enhanced the reporting capabilities in its Webmaster Tools to show full phrases being used in anchor text, instead of just keywords within the anchor text of links to a site, the company said on its Webmaster Central Blog last night.
Webmaster Tools previously had reports showing the top terms were included in the anchor text other sites are using to link to them. So the list would include the terms "Search," "Engine," and "Watch" when it assesed this link: Search Engine Watch.
With the new reports, the entire phrase will be shown on the list, so the previous link would show up in reports as "Search Engine Watch," which is a much more useful bit of information.
The report is available to verified webmasters who have logged into Google's Webmaster Tools, under the "Statistics" tab, in the "Page Analysis" section. The list of top 100 anchor text phrases can be viewed as a table or downloaded as a csv file.
It's useful to see what anchor text is being used to link to your site, as it provides some insight into what people think of your site, or why they are linking. The anchor text also affects Google's ranking algorithm, so since lots of other sites link to our blog with the term "SEW blog" as anchor text, that affects results for a search on [SEW blog], causing this site to show up at the top of those results.
Posted by Kevin Newcomb at 8:06 AM | Permalink
Is this worth a 4.0 on the SEO Richter scale? Probably not. Just a rumble really, but oh how nothing shakes up the SEM industry and gets SEOs chatting like a nice bug in Google search results. Bring up the topic of the Supplemental Index and duplicate content, the story gets even juicier.
As has just been confirmed by Google's Vanessa Fox, there is in fact, something amiss with the current "site:" command, which is currently being rectified 'as quickly as possible', and this is merely the result of display issue that which shouldn't have any impact on search queries or ranking. (Special thanks to Vanessa, for working with us on sorting out this issue and finding a solution so quickly!)
But let's dig deeper in into why this is such a big deal in the SEO world.
The "site:" command tells you how many of your sites' pages are indexed in Google. In Google's Webmaster Central, the official syntax is "site:domain.com", and many SEO experts look at this as a real number.
So when Google starts to suddenly return disparaging results for that command, it raises a red flag in the industry, and the conspiracy theories fly. For SEOs and webmasters, the questions that immediately come to mind are along these lines:
Probably nothing to raise your blood pressure over, but definitely this glitch is an anomaly in Google SERPs.
As is well documented here at SEW and other sites around the Web, typing "site:www.domain.com", "site: www.domain.com", or "site: domain.com" will return drastically different results. Note the differences when using a space after the colon, as well as when using the www vs. non-www version of a domain.
At SEW, we were alerted to this problem yesterday when the effervescent David Naylor posted that something was amiss with the results for SEW. The "site:" command site:searchenginewatch.com shows only 1 page, with "about 268" similar pages whose results are omitted.
Rest assured, at SEW, we do still have a vibrant pulse, and have not experienced any significant drops in traffic due to this problem. So, it's too early to plan a funeral. I am happy to report that traffic is normal at Search Engine Watch. In fact, it has actually been growing fairly steadily since January 1, and that deserves a post of its own.
As it turns out, Dave Naylor was not the first to discover this problem, as Danny Sullivan points out in his SEL post, Webmaster World has had a discussion going on this for almost a month now. Several large, authority sites, with total numbers of indexed pages reaching in the tens or hundreds of thousands were seeing this result as well.
Because of the strange coincidences in the number of results, Danny Sullivan does get credit for dubbing this "About 260" problem. However, that may not be an entirely correct title, because in some datacenters, the result is "about 359" for the same search. Try the searches among different browsers (Firefox/IE) and with personalized search on/off. While some are not dramatically different, they do still fall into the "About 260" category, other searches are up by at least 100 more results.
SEW blogger Eric Enge dug up similar examples of other authoritative sites exhibiting this problem:
Posted by Elisabeth Osmeloski at 2:44 PM | Permalink
In a post at Search Engine Roundtable, Barry Schwartz points out that Googler Adam Lasnick has been busy explaining the effects of the rel="nofollow" tag in some posts on the Webmaster Central group. In a post to the group on Tuesday, "Vanessa is confusing me ~ nofollow again...," he clarified that Google does not crawl nofollow links: >Does Google crawl a rel="NOFOLLOW" tagged link and not give it credit,>or does it just stop at the link and not visit that page unless it's> found elsewhere?
As Aaron [Pratt] correctly noted, the answer is the latter :)
Google's handling of the tag was not made entirely clear when it was originally announced two years ago: From now on, when Google sees the attribute (rel="nofollow") on hyperlinks, those links won't get any credit when we rank websites in our search results.
That same murky explanation is given in the Webmaster Help Center.
Lasnick and Matt Cutts have attempted several times to clarify exactly what Googlebot does with the tag.
Cutts stated explicitly that Google does not crawl nofollow links in July 2006, in his Bot Obedience: Herding Googlebot post: "At a link level, you can add a nofollow tag on the granularity of individual links to prevent Googlebot from crawling individual links (you could also make the link redirect through a page that is forbidden by robots.txt). Bear in mind that if other pages link to a url, Googlebot may find the url through those other paths."
Lasnick stepped in again last night to further clarify the issue in another post, If rel="nofollow" is becoming the norm. He notes that "nofollow links aren't listed any differently than other links in our Webmaster Tools backlinks section," and said that nofollow links will show up in search resulsts using the "link:" operator.
Matt Cutts has posted quite a bit on the rel="nofollow" tag. He's mentioned its purpose in stopping comment spammers May 2006, and talked about its use with paid links in a post from September 2005 called Text links and PageRank:
"The nofollow tag allows a site to add a link that abstains from being an editorial vote. Using nofollow is a safe way to buy links, because it’s a machine-readable way to specify that a link doesn’t have to be counted as a vote by a search engine."
The tag itself has drawn much criticism for its inability to curb blog spam, as it was originally intended. Most recently, Search Engine Journal's Loren Baker is calling for an end to the nofollow tag, giving "13 Reasons Why NoFollow Tags Suck," and David Wallace at SearchRank agrees in his post, "NoFollow Tag is a Dismal Failure."
Posted by Kevin Newcomb at 12:39 PM | Permalink
Google has made some minor changes to its Webmaster guidelines, which appear to be aimed at clarifying policies rather than substantially changing them. The changes, as spotted by Philipp Lenssen, include a change in emphasis on the result of violating quality guidelines from removal from Google's index to a site's removal or being "otherwise penalized." Another change adds language to clarify that links are not the only element of PageRank, and removal of a phrase saying ranking of sites is completely automated.
In the comments on Lenssen's post, Google's Matt Cutts weighs in with a personal opinion that this is the result of a clean-up: "I wouldn't characterize this as a policy change. Some folks (including me) have been taking a fresh look at our web documentation to see if anything can be misunderstood. We'd like to minimize the chance for misinterpretation of any help documentation. I wouldn't be surprised to see similar small changes throughout our documentation to clarify various points," Cutts wrote.
In addition, Google spokesperson Victoria Grand told SEW: "Our policies are the same today as they were yesterday and before we made this change. We are just clarifying our documentation to prevent potential misunderstandings."
Posted by Kevin Newcomb at 2:14 PM | Permalink
As Kevin Newcomb mentioned yesterday, Danny Sullivan had an outstanding write-up yesterday about Google's enhancement of its "Link:" operator which allows researchers to discover many of the links that Google has indexed as pointing to a particular URL: Google Releases New Link Reporting Tools.
Google will allow users of its Webmaster Central tools to see more thorough reports of inbound links as measured to domains and even particular pages. More information is also available at the Webmaster Central blog.
This underscores the importance of working with Google by signing up with Webmaster Central. Not only will it help to get important pages of a Web site indexed, but it will also assist webmasters in conducting important competitor analysis. In the past, many researchers have almost completely ignored the Google "Link:" command, or operator, since it is known that Google does not display all of the links it knows about. Others have continued to use it, thinking that the ones that Google shows "must be more valuable" than others.
This has in fact been an often discussed topic at the Search Engine Watch Forums, where a sticky thread discusses the topic of the difference in inbound link reporting at various engines, and reveals that the current link discovery tool of consensus choice is the one found at Yahoo! Site Explorer. Although it is unlikely that those many converts will now abandon Yahoo! to use only the Google enhanced version, this news has made many webmasters and search engine optimization specialists happy.
(Begin editorial) Our engineers at Avenue A | Razorfish love to use the Webmaster Central tools, but, believe it or not, sometimes have problems with getting clients to approve the use, since it requires verification code to be placed on the Web site. Perhaps if Google would be more open about sharing its information without requiring this code, it would get a better reputation with some marketers that feel that they require too much "inside information." Google has done a great job in helping webmasters with their Web sites, but still needs to improve its relationship and willingness to work with agencies and other SEO companies, in the opinion of some. (/end editorial)
Posted by Chris Boggs at 11:13 AM | Permalink
Is your site included in Google News? Is your site in English? If so, you just got new support from Google Sitemaps. You can submit your news articles for inclusion and also monitor crawling stats. More from Google in Introducing Sitemaps for Google News.
Posted by Danny Sullivan at 12:11 PM | Permalink
Learn more about Googlebot's crawl of your site and more! at the Official Google Webmaster Central Blog covers new features Google has added, visual charts to show Googlebot's crawling activity, expanded crawl rate support, inclusion in the image search labeling program and number of URLs submitted. I talked with the Google Webmaster Central team earlier this week, and here are a few more details on some of the features.
To see Googlebot activity reports, go to Google Webmaster Tools, choose one of the sites you've verified, then pick the "Crawl rate" option on the Diagnostics tab. You'll get a chart showing how many pages Google has crawled per day over the past three months. For example, here's what it looks like for the Search Engine Watch Blog:
It's interesting to see visually how Google has backed off the number of requests over time. There's nothing I've done to do this, but it may reflect Google getting smarter about the fact that it doesn't need to revisit every page on the site so often. It could also be due to our server being less responsive (see below).
You can also see kilobytes downloaded per day, as well as the time spent downloading a page in milliseconds. The chart on that for us is really revealing:
You can see that our response time nearly doubled at the end of July. That's exactly when we left our servers at Jupitermedia, our old publisher, and switched to new ones with Incisive, our current publisher. Despite the slower time, I haven't noticed any drop in traffic from Google, so the slower responsiveness -- while not good -- hasn't been damaging. But if you did see a plunge in traffic, a chart like this might help you visually realize what might be wrong directly from Google.
At the bottom of the Crawl rate page is the ability to set how fast you want Google to crawl your site. This was introduced back in August, but now it's available to everyone using Google Webmaster Tools, not just some. In addition, Google has simplified the options from five to just three, Faster, Normal and Slower. Google said feedback suggested fewer options would be easier to understand.
Crawl rate still doesn't guarantee that Google will hit your server faster or slower than normal, even if you request it. But Google said it is much more responsive to these requests now. In fact, it is so responsive that you need to renew your choice every 90 days. That's to prevent someone authorized on your account from telling Google to slam your server, then leaving and Google continuing to do that forevermore.
Also on the Diagnostics tab, you'll find an Enhanced Image Search option. What's that about? For now, it simply means that images from your site will be available to those using the Google Image Labeler system, which we wrote about last month: Google Images Labeler: Google's Challenge To Flickr?
Not all images from Google Images are currently added to Google Image Labeler. Google said it currently uses a subset of pictures that it feels site owners would be amenable to having labeled. This new feature lets you explicitly tell Google you'd like to have your pictures play in the new program. More on this is covered in the help page about enhanced image search.
Finally, if you submit a sitemap to Google, it will now tell you the number of pages submitted in that sitemap. Why care? Apparently, at least one person did and requested the feature. As Google explains in that blog post, this person generated a sitemap automatically and so had no idea how many URLs he was spitting out in it. Now he -- and others -- can know.
Posted by Danny Sullivan at 7:04 PM | Permalink
Vanessa Fox announced at the Google Webmaster Central blog that the query stats from within the statistics section of Google Webmaster Central will be now updated weekly, instead of every few weeks. The updates are most likely to occur each Monday but no specific time was mentioned. There are more details on how Google calculates the query stats in Vanessa's post.
Posted by Barry Schwartz at 8:32 AM | Permalink
One of the things that came out of our Bot Obedience Course at SES San Jose last month was a wish that search engines somehow made it possible for site owners to know they were sending "trusted" or "certified" spiders. Now Google's suggested one way this can be done.
Those blocking rogue spiders through IP filtering run the risk that they might accidentally keep some of the "good" bots out. If you don't know all the Google IP addresses, there's a chance you might reject a Google spider accidentally. That might cause your pages to be dropped from Google.
How to verify Googlebot from Matt Cutts at the Official Google Webmaster Central Blog covers a suggested technique to avoid this. Basically, all Google spiders will report they are from the googlebot.com domain. So do a DNS lookup on the IP address. If it comes back as googlebot.com, then you're halfway there. Halfway? Yes, that's because people can lie about domain names. To avoid spoofers, you then have to look up the domain name you found to see if it matches the original IP range.
The blog post explains more, and it's going to make the most sense to tech-savvy webmasters that are implementing some type of IP filtering or blocking already. Not doing that? Then don't worry about this -- it's not really for you.
Down the line, perhaps we'll see less tech-savvy solutions come up, for those sites getting slammed by bad bots but without IP filtering. But this is a great start for now.
Matt's also mentioned this on his personal blog, where people are commenting on the technique.
Posted by Danny Sullivan at 5:25 AM | Permalink
A new job opening from Google, Webmaster Trends Analyst. It's all about helping Google monitor what webmasters are upset or concerned about at forums, conferences and other venues. From the job description:
Responsibilities:
Sounds like a perfect job for Barry Schwartz! Of course, if I lose yet another news editor to a search engine, oh vey!
Posted by Danny Sullivan at 3:13 PM | Permalink
Seattle 24x7 has an excellent conversation with Vanessa Fox and Amanda Camp of Google on Google Webmaster Central and working at Google. Both Google women began working at Google in April of 2005 in Seattle. They discuss the conception of Google Webmaster Central (also known as Google Sitemaps). The discussion also goes into the 20% time and recruiting Google women. You can see a picture of the "Seattle's Sisters of Search" also.
Posted by Barry Schwartz at 9:02 AM | Permalink