« AOL Adding Podcast Search Tools | Main | Roundup Of Google Blog Search Commentary »
September 14, 2005
Thoughts On & Poking At Google Blog Search
Chris covered the launch of Google's new blog search in today's SearchDay article, Google Launches Industrial Strength Blog Search. In this post, I want to add some of my own thoughts. I'll also be working up a rundown on reaction from others, and Gary may be adding his own thoughts as a postscript here or as a separate post. Top line thoughts? It's not spam free. I wish it were "full text" blog search to better represent the blog world. It's got a short memory, not going back past March 2005. But the backlink info looks good, certainly better than you'll get on Google itself.
- Chris mentioned this in his article, but I think it's worth stressing,
technically, this is FEED SEARCH. You are only searching through any
feed that Google has found. Some blogs don't have feeds. Some feeds don't come
from blogs. Google understands these issues and figures down the line, it may
have to revisit changes to make it truly a blog search, if that's what's
intended.
- By default, sorting is by RELEVANCE, not DATE. If you are looking for the
latest posts on a particular topic, use the "Sort by date" link in the upper
right-hand corner. Unfortunately, you can't save this as a preference.
However...
- As Chris noted, you can have results constantly sent to you via a feed
alert. The feed links are at the bottom of each page. So if you wanted to know
the latest blogs mentioning Google, you'd search for that word, sort by date,
then subscribe.
- Want to know the latest backlinks to your blog? Use the link: command,
such as link:blog.searchenginewatch.com,
sort by date, then subscribe to a feed of that search. That shows all links to
your domain, to any page anywhere on your blog and will send you the newest
ones.
- Want to know the latest backlinks to a particular post? Use the full page
address, such as
link:blog.searchenginewatch.com/blog/050831-091033. That brings back
matches linking just to that page.
- Don't want to learn these commands? Just type in a full URL, with or
without the http:// prefix into the Blogger
version of Google Blog Search. It will automatically do the right thing
there and show backlinks.
- As Chris notes, Google says that for blog search backlinks, it's not
suppressing any of the links it knows about. To spell that out, here are some
figures to contemplate:
-
link:blog.searchenginewatch.com on Google web search brings back
"about" 4,000 results
-
link:searchenginewatch.com on Google Blog Search brings back
6,586 results
- link:blog.searchenginewatch.com on MSN Search web search brings back 15,551 results
Notice, a search across the ENTIRE web on Google brings back fewer backlinks than across the much more limited feed database on Google. Why? The third line shows the answer. A search on the ENTIRE web on MSN Search web search brings back more results as well, despite MSN supposedly having a slightly (very slightly) smaller database of pages based on self-reported figures. Google simply doesn't report all the backlinks it knows about for web search, something it has said time and again when pressed on the issue, a fact well know to many experienced search marketers.
-
link:blog.searchenginewatch.com on Google web search brings back
"about" 4,000 results
- It's not FULL TEXT blog search. Huh? If you post to a blog, you might not
send out the entire text of your post in a feed. We don't, for instance. Our
reason is that we don't want everyone assuming they can reprint our material.
Jason Calacanis of Weblogs has
written
of similar issues despite copyright warnings in his full-text feed. But
Google's only currently searching what's in the feed, meaning that it actually
may be ignorant of a huge amount of blog content that's not pushed in a feed.
That produces some skewing, as I
found with
PubSub back in June.
Ideally, I'd like to see Google do what Technorati does and grab the actual full-text of the post, rather than depend just on the feed. For its part, Google says this is something it's pondering.
- The site: command is
said to work,
but I didn't find that the case.
site:scripting.com came back with no matches, for example. But the new
blogurl:scripting.com seems to do the trick. However, compare that to
site:scripting.com
on Google web search. Blog search gets about 414 matches, while web search of
that blog brings back 344,000 matches. It's a huge difference and show the
greater blog coverage Google web search actually gives.
The advanced search page highlights the issue. You'll see that the earliest date you can search back to is March 1, 2005. In other words, the feed database has a much shorter history range than the web database, something that full text indexing would solve -- though you'd lose the ability to more accurate do things like author and date range searching if you're taking scraped data, rather than delimited data in a feed.
- Spam clearly hasn't been eliminated. A search for google blog search brings up a series of "Related Blogs" that are all spammy in nature to me. However, the main results below look fairly clean. But for a query on google, spam is back with a vengeance. The first result (on Google's Blogger service) tells me:
Resources To Acquire Stanley Power Tool Or Draper Power Tool On The Internet Get your stanley power tool on the world wide web. The first thing I thought of is how easy it is to get stanley power tool online. Google has listings for many stanley power tool sites. There are lots of stanley power tool that will help you.
In fact, the first four results when sorted by date are all similar in terms of spammy, nonsensical copy. Doorway page spam on Google -- it is 1999!
What we need is either better spam filtering or some type of super "sort by date and relevancy" feature. PubSub's got a feature that's sort of like this, but when I last looked, I still found spam and irrelevant content getting though.
- Freshness or comprehensiveness seems an issue. For that query on
google, I get the latest post as being 40 minutes ago, with the one after
that an hour ago, then the next one two hours ago. That's it? Over the past
two hours, there's only been three blog posts about Google?
While I don't want all those poor selections where just anything mentioning Google may come up, I also want to see the latest. What we need is either better spam filtering or some type of super "sort by date and relevancy" feature. PubSub's got a feature that's sort of like this, but when I last looked, I still found spam and irrelevant content getting though.
Want to discuss or comment? Visit our forum thread, Google Blog Search Launched.
Posted by Danny Sullivan on September 14, 2005 7:19 AM











