SES New York 2010, March 22-26
Subscribe to SearchDay, our free daily e-mail summarizing the day's Search Marketing News.
Recent Comments

« Social Networking Taking Market Share from Dating, Adult Entertainment Sites | Main | SEW Experts: Landing Page Optimization -- Insource or Outsource? Part 2 »

September 16, 2008

Google Discusses Search Evaluation Process

Google had been doing a series of posts about search quality. Today, the latest post in the series discusses how evaluation enters into the the process.

Scott Huffman, Engineering Director, gave four insights into the nuances of difficulty experienced in search evaluation:

  • First, understanding what a user really wants when they type a query -- the query's "intent" -- can be very difficult. For highly navigational queries like [ebay] or [orbitz], we can guess that most users want to navigate to the respective sites. But how about [olympics]? Does the user want news, medal counts from the recent Beijing games, the IOC's homepage, historical information about the games, ... ? This same exact question, of course, is faced by our ranking and search UI teams. Evaluation is the other side of that coin.
  • Second, comparing the quality of search engines (whether Google versus our competitors, Google versus Google a month ago) is never black and white. It's essentially impossible to make a change that is 100% positive in all situations; with any algorithmic change you make to search, many searches will get better and some will get worse.
  • Third, there are several dimensions to "good" results. Traditional search evaluation has focused on the relevance of the results, and of course that is our highest priority as well. But today's search-engine users expect more than just relevance. Are the results fresh and timely? Are they from authoritative sources? Are they comprehensive? Are they free of spam? Are their titles and snippets descriptive enough? Do they include additional UI elements a user might find helpful for the query (maps, images, query suggestions, etc.)? Our evaluations attempt to cover each of these dimensions where appropriate.
  • Fourth, evaluating Google search quality requires covering an enormous breadth. We cover over a hundred locales (country/language pairs) with in-depth evaluation. Beyond locales, we support search quality teams working on many different kinds of queries and features. For example, we explicitly measure the quality of Google's spelling suggestions, universal search results, image and video searches, related query suggestions, stock oneboxes, and many, many more.

Not sure if I'm buying that Olympics example. Google didn't do a great job with the Beijing Olympics, and surely their algorithm could handle serving up more relevant search results during the time surrounding the event.

I'm not saying that search query intent evaluation is easy, just that the Olympics query is not quite as problematic as Google is making it out to be.

The rest of the points are things we've been hearing from Google for a long time. We know they're progressing on universal and personalization search efforts, all in their famous intent to create the best user experience.

So, what methods does Google employ to address these evaluations? Huffman offered up the following:

  • Human evaluators. Google makes use of evaluators in many countries and languages. These evaluators are carefully trained and are asked to evaluate the quality of search results in several different ways. We sometimes show evaluators whole result sets by themselves or "side by side" with alternatives; in other cases, we show evaluators a single result at a time for a query and ask them to rate its quality along various dimensions.
  • Live traffic experiments. We also make use of experiments, in which small fractions of queries are shown results from alternative search approaches. Ben Gomes talked about how we make use of these experiments for testing search UI elements in his previous post. With these experiments, we are able to see real users' reactions (clicks, etc.) to alternative results.

    What do you think of Google's search evaluation? What evaluations would you like to see them conduct? Discuss in the comments.

    Posted by Nathania Johnson on September 16, 2008 11:07 AM

    • Stumble It
    • Add to del.icio.us
    • Tweet it on Twitter


    Comments

    Great post Nathania!

    The problem is search results are subjective. When I type in golf clubs and you type in golf clubs, we can be looking for 2 totally separate results (maybe I want to buy and you are looking for news).
    They almost have to change the search results based on individual searches. They need to track what people are clicking after they type in "golf clubs" and then serve mostly those types of results (ecommerce sites vs news sites).
    I like these types of posts because it helps me understand the search engine results better. With a better understanding of the engines, I can optimize the webpages/sites better for their specific users.

    Al Scillitani  September 16, 2008 11:48 AM

    Interesting post, and Al nails it when he said that search results will always be subjective. Google has to play a guessing game on what people are looking for. If results were tailored for the individual based on browsing history then we might get somewhere.

    One More Blog  September 16, 2008 9:47 PM

    Excellent post, Nathania! If you wouldn't mind, I'd like to invite you to take a look at Surf Canyon: www.SurfCanyon.com.

    Al, if you're there, perhaps you'd like to take a look as well. This is perhaps the application that you're describing in your comment above.

    By observing "post-query" behavior we're able to re-rank the results in real time, bringing forward those that are more pertinent while suppressing those that are irrelevant. Essentially it's disambiguating user intent "on the fly" by immediately exploiting user signals.

    Please try and we'd love to hear your feedback. Thanks!

    Mark Cramer  September 17, 2008 6:16 PM

    Post a comment




    Remember Me?

    (you may use HTML tags for style)