April 19, 2007

Google Income, Profits Up

Google reported its first-quarter financial results today, with profits up 69 percent to $1 billion, or $3.18 a share, on revenues of $3.66 billion, up 63 percent year-to-year.

Google execs spent much of the earnings call emphasizing the importance of its core businesses, search and search advertising. The company has been criticized of late for its apparent lack of focus, exhibited by its dabblings in offline media, intended acquisition of display ad network DoubleClick, and other non-search activities. More details are available from ClickZ News.

Posted by Kevin Newcomb at 10:22 PM | Permalink

March 21, 2007

Minus Expenses Yahoo, Google Ad Revenues Similar

While it is well known Google has higher search market share, recent analysis of profits show Yahoo's ad revenue is a lot closer than the market share would suggest.

Jupiter Research blogger David Card reports thast Google has 17 percent of online ad revenue to Yahoo's 16 percent.

These numbers are of what the comapny keeps after expenses.... could be the YouTube deal is holding down those numbers... or as some are suggesting Yahoo keeps more of the money - they pay less to their search partners... though from what I have seen Yahoo pays more, but has lower click through rates...

Posted by Frank Watson at 9:55 AM | Permalink

November 21, 2006

Google Breaks $500 Per Share

For fun, my MSN Direct watch is set to show the stock prices of all the major search engines. It only works in the US, and I wish I were there now to capture a screen from it showing Google having broken the $500 per share mark. AP has a longer story on it here. It's around $503 when I looked just now.

No doubt Safa Rashtchy at Piper Jaffray is keeping his fingers crossed (along with thousands of Googlers) that it will keep going. Earlier this year, Safa predicted a $600 price point by the end of 2006.

That's still much to go but far closer than the $2,000 that analyst Mark Stahlman predicted a few days later However, I don't believe Stahlman gave a time when that would happen.

Given that Google's cofounders are big fans of Berkshire Hathaway -- which I believe has never split its stock -- it's possible that inflation alone could help Google get into that range, assuming they also avoid splits. Berkshire's class B stock is at $3,588, currently.

Posted by Danny Sullivan at 11:06 AM | Permalink

November 9, 2006

Microsoft, Ask & Fox On Google At Web 2.0

Photo from kennejima at Flickr

Yesterday, Ask and Microsoft talked about taking on Google at the Web 2.0 Summit. But honestly, the highlight for me was the image of Microsoft's Steve Berkowitz sitting next to Ask's Jim Lanzone. Lanzone use to work for Steve, then took over his spot running Ask when Steve left. Both remain good friends, and it was cool to see them up on that panel side by side.

ZDNet covered what they said, plus they have an even better side-by-side photo. Jim's push in taking on Google is that its vulnerability is being distracted by projects other than search. He also puts out this new line I haven't heard used before: "Google is the model T of search. Over time peoples' needs evolve." But I heard you can have search in any color you want, as long as it's black!

Steve talked about consumer experience, the idea that search within IM might be presented differently than within a community site. Plus, he talked about Google's weakness in terms of cultural issues, such as still learning how to act as a public company.

Greg Linden also has a short write-up of the talk, looking at the question about personalized search. Steve wanted to give users complete control of their data. Jim was more pessimistic on personalized search, seemingly in terms of users actually helping with it, since more are "lazy" and don't want to customize things, which is pretty true.

Greg points over at InternetNews, which has another write-up of the talk -- this time with Microsoft CTO Ray Ozzie saying in the fight against Google, there is "immense opportunity in the core space" that he's "surprised" Microsoft hasn't branched into. I take core space to mean search.

At PaidContent.org, Ross Levinsohn of Fox Interactive is noted to have said it was "genuine" of Google CEO Eric Schmidt to have visited so quickly after Google snapped YouTube away from a possible purchase by Fox. Plus, he offers soothing words that YouTube would have been "fun" to own but Fox couldn't do it at that price.

Posted by Danny Sullivan at 8:50 AM | Permalink

Google Video Sued, Plus More Info From New SEC Filing

The Associate Press reports that Google Video was actually sued for copyright infringement but yet, Google did not reveal who actually sued them. The lawsuit was disclosed by Google via a quarterly filing with the Securities and Exchange Commission (link via Gary, but we do not know much more. PaidContent reports (site currently down), that Google may loan YouTube money prior to closing the deal with them, in order to help them settle or battle certain lawsuits.

Posted by Barry Schwartz at 8:02 AM | Permalink

November 2, 2006

In UK, Google To Surpass Channel 4's Ad Dollars

The Independent reports that Google UK is expected to earn "£900m from the UK ad market in 2006." When compared to Channel 4's "£800m at the TV group" this year, Google is expected to beat this TV player in ad dollars. Channel 4's Andy Duncan said, "Some broadcasters have been very slow to realise this. The industry as a whole is frankly rather backward-looking and is perhaps underestimating the scale of change that is going on and the pace of change."

Posted by Barry Schwartz at 8:59 AM | Permalink

October 26, 2006

Google 3rd Most Valuable Technology Company

Nathan Weinberg, while at a wedding, reports that Google has passed IBM to be ranked the 3rd most valuable technology company, behind Cisco and Microsoft. Google's stock is currently trading above $475 per share, so their "market capitalization reached some 145 billion dollars, surpassing the total value of IBM at 139.5 billion." More details at The Raw Story.

Posted by Barry Schwartz at 9:33 AM | Permalink

October 19, 2006

Google Releases 2006 3rd Quarter Results

Google has released their earnings for the third quarter of 2006. You can find the earnings summary at Google's press center. Bloomberg summarizes the reports showing how profits almost doubled and revenue increase seventy-percent. Google's net income is $733.4 million, or $2.36 a share, up from $381.2 million, or $1.32, a year ago. Google's revenue rose to $2.69 billion.

Posted by Barry Schwartz at 5:09 PM | Permalink

October 18, 2006

Google To Own 25% Of 2006 Online Ad Revenue

An eMarketer.com report estimates that Google will account for twenty-five percent of all online ad revenue. Google's share continues to increase (65% increase YoY) while Yahoo's growth continues to decrease, eMarketer says. Google first surpassed Yahoo in ad revenue back in 2005, but barely. Google in 2006 is expected to earn over $4 billion in ad revenue but Yahoo has just $2.9 billion according to eMarketer.com.

Posted by Barry Schwartz at 9:24 AM | Permalink

October 12, 2006

Ballmer: YouTube Overvalued & Google Transferring Wealth From Content Owners

The Web According to Ballmer from BusinessWeek has Microsoft CEO Steve Ballmer questioning the value of the Google-YouTube deal and oddly warning that Google is transferring wealth away from rights holders. It's an odd statement, since that's what Microsoft wants to do as well.

First the questioning of the YouTube value:

[You've got to ask] could Google do whatever it is they're hoping to buy without paying $1.6 billion? Is YouTube really some permanent, long-term thing, or is it a fashion?....Right now, there's no business model for YouTube that would justify $1.6 billion.

Though strangely, when BusinessWeek tries to pindown what seems a clear statement that Google overpaid, Ballmer says:

I'm not saying it is overvalued. I'm not trying to say that. It depends on a set of factors. I'm not saying I wouldn't write a check for that amount of money. I might.

And back to the controversial statement about Google's relations with content:

And what about the rights holders? At the end of the day, a lot of the content that's up there is owned by somebody else.

The truth is what Google is doing now is transferring the wealth out of the hands of rights holders into Google. So media companies around the world are all threatened by Google. Why? Because basically Google is telling you how much of your ad revenue you get to keep. They better get some competition. Us. Yahoo! (YHOO). Somebody better break through or you can short all media stocks right now. As long as there are two, you can hold onto media stocks. Google understands that. And that's one reason why they're willing to lose money up front.

Microsoft has its own video sharing service up, Soapbox. It has a question answering service, Q&A. It has an entire search engine that crawls the web like Google, Windows Live. Microsoft has plans for contextual placement of ads on pages, similar to AdSense. It's specific to MSN content now, but that will inevitably change. All of these things leverage the content of others in order to make money from Microsoft. So if these actions leverage wealth away from content owners, Microsoft is just as guilty of it as Google.

Frankly, all Ballmer seems to be saying is content owners would be better off if Microsoft was a strong third participant in ad game. Sure -- but let's not kid ourselves. Microsoft gets a lot better off by that as well, and it didn't jump into the game out of some desire to counter-balance the power of Google. It's in it to make as much money as it can, as well.

Posted by Danny Sullivan at 7:42 AM | Permalink

October 11, 2006

Yahoo Hurting While Google Healthier Than Ever

The NY Times has an article named Yahoo’s Growth Being Eroded by New Rivals (free version available at (IHT.com). The article goes through how Yahoo is suffering and lagging behind its competitors. (1) They made a bid at YouTube but those deals broke down, according to the article, and Google "swooped" them up. (2) The new Yahoo search ad system, Panama, is over a year delayed. This "delay has sucked up the company’s engineering resources and prevented it from developing new advertising products."

Based on my coverage of Yahoo over the past year, it seems like webmasters, SEOs, and industry folks have become less and less interested with the company.

The LA Times has an article this morning that goes on the same theme. If you can't get to the article, try going through Google News to gain free access, it worked for me.

Postscript From Greg Sterling:

This is not the kind of publicity you want to see if you're on the PR team. While it's true that Google has momentum and Yahoo may need a kind of "shot in the arm," what people forget is that Yahoo is the largest site on the Internet with the most monthly uniques.

It also has a bunch of market-leading properties including mail, finance and local (among others). Mail is also the number one mobile site.

Google, though a very dynamic and powerful company with lots of momentum, is not without its challenges and vulnerabilities. If anything the YouTube acquisition was an admission of some of those. Though, by the same token, Google now has great opportunity with YouTube.

I'm not sure, from where I sit, how many problems identified in the Saul Hansell Times piece are real and how many are simply perceived. But perception does influence reality.

Yahoo is a little like a strong sports team that happens to be in a bit of a slump right now.

Posted by Barry Schwartz at 9:40 AM | Permalink

September 25, 2006

Fortune Looks At Chaotic Google & Whether It Can Have A "Second Act"

Chaos by design is a Fortune cover story on Google, covering the company's fast-paced, seemingly disorganized approach to products and exploring if it can come up with a "second act" to please investors:

There's nothing to suggest that its growth engine -- ad-supported search -- is in trouble. But it's clear from Google's tentative lurches into new forms of advertising and its spaghetti method of product development (toss against wall, see if sticks) that the company is searching for ways to grow beyond that well-run core.

Another highlight:

What vexed Galaxy is precisely Google's challenge today. For all its new products -- depending on how you count, Google has released at least 83 full-fledged and test-stage products -- none has altered the Web landscape the way Google.com did. Additions like the photo site Picasa, Google Finance, and Google Blog Search belie Google's ardent claim that it doesn't do me-too products. Often new services lack a stunningly obvious feature.

And:

Much-hyped projects like the comparison-shopping site Froogle (nearly four years in beta and counting) and Google's video-sharing site have been far less popular than the competition. One of Google's biggest misses is its social-networking site, Orkut, which is a hit only in Brazil and -- as Marissa Mayer, Google's 31-year-old vice president of search products and user experience, says with an impressively straight face -- is "very strong in Iran."

In case you've missed it, the entire Google-needs-a-hit-like-web-search theme/meme has been strong this year. I tend to view those expectations as unrealistic. I agree, it's been some time since Google's come out with an "oh wow" product similar to Gmail or Google Maps, where people get very buzzed about it being so different or new or unique. But then again, I haven't exactly been "oh wowed"  that much by stuff out of Yahoo or Microsoft or Ask, either.

For me, personally, Yahoo's Flickr has become a category killer in photo sharing. I use it all the time. But it wasn't home grown. Yahoo Answers is more of a homegrown wow -- perhaps not so much with me, but with others certainly.

Microsoft's Windows Live Local rocks for how anyone can create custom collections, and the new image search interface is wonderful. Both get wows or "cools" out of me.

Ask has been doing great stuff with its maps and smart answers -- the earthquake smart answer like you see here is the latest and something I totally would have gone to Ask for last month during a small Bay Area quake I felt.

Still, Google's got plenty of good stuff as well. I don't know that any of these players are going to roll out a major "second act" that blows away everything before. Instead, it's more likely we're going to see a steady growth of products, with all of them having some gains and plenty of products that simply won't catch on.

Posted by Danny Sullivan at 10:01 AM | Permalink

August 30, 2006

Google CEO Eric Schmidt Joins Apple's Board Of Directors

Google CEO Eric Schmidt's looking for another small company to help run -- this time, Apple. He's just been elected to Apple's board of directors.

Google CEO Dr. Eric Schmidt Joins Apple's Board of Directors is the press release on the move, with these quotes from the two main men:

"Eric is obviously doing a terrific job as CEO of Google, and we look forward to his contributions as a member of Apple's board of directors," said Steve Jobs, Apple's CEO. "Like Apple, Google is very focused on innovation and we think Eric's insights and experience will be very valuable in helping to guide Apple in the years ahead."

"Apple is one of the companies in the world that I most admire," said Eric Schmidt. "I'm really looking forward to working with Steve and Apple's board to help with all of the amazing things Apple is doing."

Google CEO elected to Apple Computer board of directors from the AFP has the expected (and reasonable) speculation that this will mean closer ties for Google and Apple.

The Wall Street Journal in Google CEO Schmidt Joins Apple Computer Board (paid sub. probably required) notes some of the cross-pollination going on:

Mr. Schmidt's election deepens existing high-level personal ties between the two companies. Genentech Inc. CEO Arthur Levinson sits on the Google and Apple boards, while former Vice President Al Gore and Intuit Inc. Chairman Bill Campbell, both Apple directors, are longtime advisers to Google. Mr. Schmidt's appointment means half of Apple's eight-person board of directors has a formal relationship with Google.

Messrs. Schmidt and Jobs also share the battle scars from long careers competing against Microsoft, Redmond, Wash. Mr. Schmidt, one of Silicon Valley's most seasoned technologists, spent more than a dozen years at Sun Microsystems Inc. starting in 1983, rising to the post of chief technology officer during that computer maker's fierce efforts to establish the Java programming language as an alternative to Microsoft's dominant programming standards. Mr. Schmidt joined Novell Inc., a bitter Microsoft rival in the market for network software, in 1997 as chairman and CEO.

Want to comment or discuss? Join our Search Engine Watch Forums thread, Google CEO Eric Schmidt Joins Apple's Board.

Posted by Danny Sullivan at 7:32 AM | Permalink

August 25, 2006

Google Has Too Much Money, Possibly Classified As Investment Fund

Bloomberg reports and the Wall Street Journal reports that Google has asked the SEC for an exception to a rule that would classify them to be regulated as a mutual fund company. Basically, if a company's securities make up more than 40 percent of their assets, then they can be classified as mutual fund company. Google's plea to the SEC was "that it is not in the business of investing, reinvesting, or trading in securities." Right now no one knows if Google will be granted the exemption.

Posted by Barry Schwartz at 10:01 AM | Permalink

August 23, 2006

Google's Dominance Of Big & Small Companies

Fortune has a nice write up they named "How Google can make - or break - your company." Not only does this article go over how Google can break a small online retailer who depends on organic results, but also how they can break large firms like travel agencies, newspapers, realtors, advertising firms and software makers (even Microsoft). The article makes a good read if you have the time. If you have more time, also read Google Sees Content Deals As Key to Long-Term Growth at the Wall Street Journal, which explores more of Google's future and how you may be a part of it.

Posted by Barry Schwartz at 8:27 AM | Permalink

August 18, 2006

Googlers Only Have Sold GOOG Stock - Cause Of Drop In Stock Price?

Bloomberg has a very interesting report on why they believe Google's stock has been falling this year, down about 7 percent this year. They say that Google's executives have sold off a boatload of stock since the IPO.

"Google's top executives have offloaded about $7.4 billion of stock, equal to about a third of the company's starting market value when it sold shares at $85 each in the August 2004 IPO," says Bloomberg columnist, Mark Gilbert. Not only that, he reports "not a single Google insider has bought a single share of the company in the 18 months since the IPO lock-ups expired." Can you believe that!

Postscript From Danny: It's worth noting that at least to me, the idea that the insiders are selling their stock and not buying is unsurprising. They've got a lot of stock. A lot of stock!

Buying some shares would probably be a good PR move, and after an article like this one, I can imagine some of the execs might start doing it. But the point of selling, as the article itself notes, is to diversify portfolios that, for these execs, are ironically unhealthily skewed toward Google.

For the curious, there are various places to see insider sales over times. Yahoo has a nice list here. Note how entries for Eric Schmidt and many others are tagged "automatic." That because, to my knowledge, they have preplanned to diversify their portfolios by selling shares automatically over time. That protects them against accusations of insider sales.

Also interesting are entries like exec Omid Kordestani acquiring 76,459 shares on June 12, 2006. Didn't the Bloomberg article say no big Googlers were buying? Yes -- so what's this? I assume that Googlers might still be gaining shares in other ways, which adds further understanding as to why they might not be buying on the open market.

Finally, it's no surprise that that over the past 18 months that neither founders Larry Page or Sergey Brin have been selling. That's because they already said in 2004 that they'd spend the next 18 months diversifying their portfolios through planned sales.

Overall, insider trades are definitely interesting to watch, and I'm sure Google will take a PR black eye over the apparent lack of purchases. But I think there are factors that don't make it as bad as it seems.

Posted by Barry Schwartz at 8:12 AM | Permalink

August 10, 2006

Google's Costs To Increase With Data Center Needs & Increased Employee Compensation

CNN Money covers how Google reports that expenses will rise in its latest 10-Q filing. Reason? Increased costs of data centers and the demand for higher employee compensation. Google commented in the report, "Our cost of revenues will increase in 2006 primarily as a result of anticipated increases in traffic acquisition and data center costs, although traffic acquisition costs may fluctuate as a percentage of advertising revenues," and "our cash-based compensation per employee will likely increase."

Posted by Barry Schwartz at 11:15 AM | Permalink

August 7, 2006

Google & MySpace In $900 Million Deal On Search & Contextual Ads

Just in, an announcement that Google and MySpace have reached a deal for Google to provide search and contextual ads to MySpace, in return for giving MySpace (well, the entire Fox  Interactive Media network) $900 million in guaranteed payments through 2010. From the press release:

MOUNTAIN VIEW and LOS ANGELES, Calif., August 7, 2006 - News Corporation's Fox Interactive Media and Google Inc. (NASDAQ: GOOG) today announced a multi-year search technology and services agreement whereby Google will be the exclusive search and keyword targeted advertising sales provider for Fox Interactive Media's growing network of web properties including MySpace.com (http://www.myspace.com).

The agreement calls for Google to power web, vertical and site specific search for MySpace.com and the majority of Fox Interactive Media properties. Google will be the exclusive provider of text-based advertising and keyword targeted ads through its AdSense program, for inventory on Fox Interactive Media's network. Google will also have a right of first refusal on display advertising sold through third parties on Fox Interactive Media's network.

The integration of Google's services including consistent search navigation across Fox Interactive Media's network of properties is slated to begin in the fourth quarter 2006 and will provide users with access to Google's industry leading search capabilities as well as text and display advertising from its global advertiser base.

Under the terms of the agreement, Google will be obligated to make guaranteed minimum revenue share payments to Fox Interactive Media of $900 million based on Fox achieving certain traffic and other commitments. These guaranteed minimum revenue share payments are expected to be made over the period beginning in the first quarter of 2007 and ending in the second quarter of 2010.

I'm at our Search Engine Strategies show in San Jose at the moment, so I don't have time to do a long post on the news, which I'm still digesting. I've taken a number of phone calls on it already, so I'll provided what I've given to some other reporters who have asked.

  • Big win for Google? Sure. Lots of traditional players are worried about MySpace, even if the site itself isn't earning that much now, from what I understand. This gets Google in, keeps Yahoo and Microsoft out, and might be a cheap payment to protect Google's front in the social networking wars. In other words, even if Google doesn't make a net profit off of MySpace, the intangibles could be worth the cost. The closer ties also give Google deeper insight into the MySpace traffic, since it will soon see everyone going to these pages. That will be very helpful for Google if it wants to do a renewed social networking effort of its own.  
  • Big loss for Microsoft and Yahoo? Maybe, maybe not. If social networking is hot, both of them -- unlike Google -- have very healthy communities in several international markets. In fact, that potentially could have been an issue in trying to win MySpace. Revenue-wise, Yahoo indirectly provides ads to MySpace, but current revenue doesn't appear to be substantial, plus Yahoo already would have been giving a big chunk of this to whomever is the unknown middleman.

John Battelle notes there's a conference call going on, plus he's working on some follow-ups, so keep an eye on his post. I or Barry will also postscript stories from elsewhere to our post here or do a fresh round-up tomorrow.

Posted by Danny Sullivan at 5:47 PM | Permalink

July 31, 2006

NASDAQ Error Sends Google's Stock Price Down To $38

The New York Sun reports that on Thursday, during after hours trading, Google's stock price fell accidentally by $350 to about $38, due to some glitch. Reportedly, "someone from a Nasdaq member firm punched in an erroneous figure to commence a trade," which caused the error. Thursday, between 4:10 p.m. and 4:12 p.m., prices for Google stock were as low as $38. At 5:01 p.m. NASDAQ disclosed their decision to "cancel all after-hours trades in Google that were at or below $352.07." So for those of you that thought you made it big, I am sorry. And for those that you that thought you lost your shirts, I am happy for you.

Posted by Barry Schwartz at 10:35 AM | Permalink

July 25, 2006

The Abridged Version: Independent Report On Google's Click Fraud Detection Practices

Last Friday, an independent report on how Google deals with click fraud was published as part of the ongoing Lane's Gifts v. Google class action lawsuit over click fraud. To my knowledge, it is the most comprehensive, detailed public look into how Google deals with click fraud that's ever come out. It finds that Google's efforts to combat the issue have been reasonable, though there are some eyebrow raising bits on how the author only finds the situation was in control by the end of 2005 and how it's impossible to fully know whether some clicks are invalid -- and thus, potentially -- impossible to prevent some types of fraud through purely automated means.

The report is long, a 47 page PDF file. Anyone interested in click fraud issues should give it a thorough read. But given how everyone's always busy, I thought I'd highlight below a number of sections that stood out in my review of the document.

The report is by Dr. Alexander Tuzhilin, Professor of Information Systems at New York University. To prepare it, he says in the Executive Summary at the beginning (page 1):

I have been asked to evaluate Google’s invalid click detection efforts and to conclude whether these efforts are reasonable or not. As a part of this evaluation, I have visited Google’s campus three times, examined various internal documents, interviewed several Google’s employees, have seen different demos of their invalid click inspection system, and examined internal reports and charts showing various aspects of performance of Google’s invalid click detection system. Based on all these studied materials and the information narrated to me by Google’s employees, I conclude that Google’s efforts to combat click fraud are reasonable. In the rest of this report, I elaborate on this point.

Immediately, the first thing that comes to mind is that he makes no mention of talking with individual advertisers, which could lead you to think that if he's only talking with Google, of course he's likely to come away with the idea that Google is doing everything just fine.

When you read the report, it's clear this isn't the case. Google does come under criticism. It's also important to realize Tuzhilin was not employed by Google to create this report. He's an independent expert appointed to my knowledge by the court. Exactly how he was selected is unclear, and I do think it would be a better report if advertiser data had been involved. But there's still plenty of good stuff here to digest.

Page 2 covers his background and materials reviewed from Google to prepare the report.

Page 3 and some of page 4 covers those he talked with at Google. Interesting details are that Google's click quality team consists of about 36 people, one-third engineers looking to design detection systems and the remaining two-thirds dedicated to doing manual investigations of suspected fraud.

Pages 4 through 6 cover the history of the internet, search engines and Google, most of which isn't that necessary for most experienced search marketers. Page 7 talks about three main ways of purchasing advertising:

  • CPM - cost per impression
  • CPC - cost per click
  • CPA - cost per action

Again, basic stuff. But it's worth touching on because of some of the current debate that Google and other search engines will be forced to go to CPA pricing to fully eliminate fraud.

On page 8, Tuzhilin lends some support of this, or at least the problems that others have raised with CPC:

Although currently popular, the CPC/PPC model has two fundamental problems:

  • Although correlated, good click-through rates (CTRs) are still not indicative of good conversion rates, since it is still not clear if a visitor would buy an advertised product once he or she clicked on the ad. In this respect, the CPA-based models provide better solutions for the advertisers (but not necessarily for the search engines), since they are more indicative that their ads are “working.”  
  • It does not offer any “built-in” fundamental protection mechanisms against the click fraud since it is very hard to specify which clicks are valid vs. invalid in general, as will be explained in Section 8 (it can be done relatively easily in some special cases, but not in general). For this reason, major search engines launched extensive invalid click detection programs and still face problems combating click fraud.

In response to these two problems and for various other business reasons, Google is currently testing a CPA payment model, according to some reports in the media. Some analysts believe that the conversion-based CPA model is more robust for the advertisers and also less prone to click fraud. Therefore, they believe that the future of the online advertising payments lies with the CPA model. Although this is only a belief that is not supported by strong evidence yet, Google is getting ready for the next stage of the online advertising “marathon.”

What Will Replace Pay-Per-Click Advertising? over at Publishing 2.0 from Scott Karp is a good roundup and debate on some of the issues of CPA perhaps as the solution to CPC issues.

I've posted lots of comments in Karp's post, but my personal view is this. Currently, Google is offering all three major payment systems: CPC, CPM and CPA. It is offering all three not just because of fraud issues but because advertisers have different goals with advertising, where different payment models may be required.

Building brand? You want impressions perhaps more than clickthrough, and suddenly CPM makes sense. Really savvy with conversion tracking? CPA might make more sense for you, as a way for you to feel less likely to be exposed to fraud and more likely to really be paying only for key traffic. Fairly rudimentary with conversion tracking? Doing low-cost CPC ads might make a lot of sense, for your situation. And beyond the three big ones, I'm sure we'll see other options emerge. The unifying goal around all of them, from Google's perspective, will be figuring out a way to help advertisers track that the ads are working according to some type of metrics that the advertisers want.

Skipping down past background on how AdWords works and the AdSense program (AdSense For Domains doesn't get mentioned, though it's a major program), page 13 starts in on what Google can tell about clicking activities.

Google is apparently making use of conversion data that advertisers provide to determine if fraudulent clicks are happening. My understanding was that conversion data was supposed to be ringfenced and not used by Google for anything, not even in the aggregate. But perhaps the policy has changed or perhaps I misunderstood this. I'll check on that (and also note that confusingly, the report says on page 34 that "None of the filters uses the conversion information that Google collects"). Certainly Google made no such restrictions when it launched Google Checkout. But even with conversion data, the report notes using this info isn't perfect.

Google collects various types of information about querying and clicking activities, including certain types of “post-clicking” data about conversion actions on the advertiser’s website where the visitor is taken following the click. All this data accumulated by Google is extracted from various sources and contains comprehensive information about visitor’s activities on the Google Network.

As stated before, the conversion data – the “post-clicking” data about conversion actions on the advertiser’s website – constitutes an important piece of this collected data. In particular, if the advertiser formally agrees to provide this information, Google collects data on whether or not the user visited certain designated pages on the advertised website that the advertiser marked as “conversion” pages, such as the checkout page and certain form filling pages. This conversion data is limited to what the advertiser decided to provide to Google and is not as rich as the clickstream data collected by advertisers themselves on their websites. Also, many advertisers decide to opt out from providing this conversion data. In this case, Google does not have any conversion information and therefore does not know what happened after a visitor clicked on the ad. Nevertheless, this post-clicking conversion data is important for Google even in its limited form because it conveys some intentions of the visitors on the advertised website and provides good insights into whether or not the visitor is seriously considering purchasing the advertised product or service....

This “raw” clicking data described above is subsequently cleaned, preprocessed and stored in various internal logs by Google for different types of subsequent analysis conducted on this data.

One inherent weakness of Google’s (or any other search engine) data collection effort that is important for detecting invalid clicks, is inability to get full access to all the clicking activities of the visitors of the advertised website. In other words, the conversion data that Google collects provides only a partial picture of all the post-clicking activities of the visitor on the advertised website. This data is important for detecting invalid clicks since better invalid click detection methods can be developed using this data. Unfortunately, Google (and other search engines) does not have full access to this data, unless the advertised website decides to provide its clickstream data to Google, which many websites are reluctant to do. However, this is not Google’s fault – this is an inherent limitation of the types of data available to Google.

While it might not be perfect, the report also notes at the end of this section that no one has the perfect collection of information:

However, this lack of full conversion data available to Google is compensated by various types of querying and clicking data that Google can collect, whereas advertisers and third-party vendors cannot. Therefore, there exists a tradeoff between the types of data relevant for detecting invalid clicks that is available to Google, advertisers and the thirdparty vendors. None of these three groups have the most comprehensive set of data pertinent to detecting invalid clicks, and each of them needs to settle for the invalid click detection methods possible only with the data that they have.

On page 14, the report addresses the frustration advertisers feel over the relatively non-granular nature of Google's reporting versus Google's need to keep some things carefully protected:

The smallest unit of analysis is one day. For example, the number of invalid clicks on an ad detected by Google (or any other related statistic) can only be reported on a daily basis (although there are certain alternative methods of obtaining aggregation granularity that is smaller than a day). In other words, advertisers cannot know if a particular click on a particular ad was marked as valid or invalid by Google, and Google refuses to provide this information to advertisers.

This is a source of contention and dispute between Google and the advertisers, and one can understand both parties in this dispute. On one hand, the advertiser has the right to know why a particular click was marked as valid by Google (when the advertiser thinks that it is invalid) because the advertiser pays for this click. On the other hand, if Google discloses this information, it opens itself to click fraud on a massive scale because, by doing so, it provides certain hints about how its invalid click detection methods work. This means that unethical users will immediately take advantage of this information to conduct more sophisticated fraudulent activities undetectable by Google’s methods.

This conflicting dilemma between advertisers’ right to know and Google’s inability to provide the appropriate information to advertisers because of the security concerns is part of the Fundamental Problem of the PPC advertising model to be discussed in the next section. More recently, Google tried to bridge this gap between Google and the advertisers.

Page 15 spends time looking at various definitions of click fraud, bringing us to page 16 which raises the bigger issue that it is impossible to know the intent of ALL clicks, which is crucial to understand what chunk of them might be fraudulent:

Unfortunately, in several cases it is hard or even impossible to determine the true intent of a click using any technological means. For example, a person might have clicked on an ad, looked at it, went somewhere else but then decided to have another look at the ad shortly thereafter to make sure that he/she got all the necessary information from the ad. Is this second click invalid? To make things even more complicated, the second click may not be strictly necessary since the person remembers the content of the ad reasonably well (hence there is no real need for the second click). However, the person may not really like or care about the advertiser and decides to make this second click anyway (to make sure that he/she did not miss anything in the ad and his/her information is indeed correct) without any concerns that the advertiser may end up paying for this second click (since the person really does not care about the advertiser and his/her own interests of not missing anything in the ad overweigh the concerns of hurting the advertiser). Therefore, in some cases the true intent of a click can be identified only after examining deep psychological processes, subtle nuances of human behavior and other considerations in the mind of the clicking person.

Soon after this, on page 17, comes the first real bombshell to me. As said above, you can't detect the intent of all clicks. Given this, there's no reasonable way to be certain that technological fixes for click fraud detection are working:

In summary, between the obviously clear cases of valid and invalid clicks, lies the whole spectrum of highly complicated cases when the clicking intent is far from clear and depends on a whole range of complicated factors, including the parameter values of the click. Therefore, this intent (and thus the validity of a click based on the above definitions) cannot be operationalized and detected by technological means with any reasonable measure of certainty.

What? Didn't the report find Google was acting reasonably? Yes, and I think this is is because as the report goes on, it's because Google's not relying solely on automated means to stop click fraud, which might allow some clicks to get through, if that were only the case.

Page 18 picks of the issue even more strongly, and I've bolded this section because it deserves special attention. Note that the italics were originally included:

The last statement has one important implication: given a particular click in a log file, it is impossible to say with certainty if this click is valid or not in all the cases. This means that

  • It is impossible to measure the true rates of invalid clicking activities, and all the reports published in the business press are only guesstimates at best.  
  • The invalid click detection methods need to be developed without a proper operationalizable conceptual definition of invalid clicks.

The important word above is all the cases since in some cases it can be stated with certainty if a particular click is valid or not. For example, it is easy to detect a doubleclick using relatively simple technological means, assuming that the doubleclick is invalid.

Again, it seems to be a case that automation can catch some, perhaps lots of click fraud, but it can't catch all of it because the intent problem. Also crucial in the above is the stressing that rates we've been given from various sources are simply guesses, since the intent of clicks aren't know to some of these other sources.

Indeed, in the case of the recent Outsell report, you don't even have to worry about figuring out the intent of particular clicks. Click fraud stats from that report come from half the panel entirely guessing about what click fraud rates they might have -- guessing, because that half does not auditing of clicks at all.

Page 19 deals with ways of identifying invalid clicks, at least according to operational approaches -- IE, automated criteria. Do the clicks show some type of:

  1. Anomaly from past clicking patterns for a site or ad?
  2. Violate certain predefined rules?
  3. Fall into certain classes of behavior that make them deemed invalid?

Page 20 explains that Google primarily depends on the first two approaches -- looking for anomalies and using rules -- but then gets into what it stresses as the "Fundamental Problem" of fraudulent clicks:

We conclude that there is a fundamental problem associated with the definition of invalid clicks for the Pay-per-Click model. This problem can be summarized as follows:

  • There is no conceptual definition of invalid clicks that can be operationalized in the sense defined above.  
  • An operational definition cannot be fully disclosed to the general public because of the concerns that unethical users will take advantage of it, which may lead to a massive click fraud. However, if it is not disclosed, advertisers cannot verify or even dispute why they have been charged for certain clicks.

This problem lies at the heart of the click fraud debate and constitutes the main problem of the CPC model: it is inherently vulnerable to click fraud.

Page 21 poses solutions to the problem:

  • The “trust us” approach of the search engines. The search engines can assure advertisers that they are doing everything possible to protect them against the click fraud. This is not easy because of the inherent conflict of interest between the two parties: the money from invalid clicks directly contribute to the bottom lines of the search engines. Nevertheless, it may be possible for the search engines to solve this trust problem by developing lasting relationships with the advertisers. However, the discussion of how this can be done lies outside of the scope of this report.  
  • Third-party auditors. Independent third-party vendors, who have no financial conflicts of interest, can work with advertisers and audit their clickstream files to detect invalid clicks.

These two approaches would still constitute only a partial solution to the Fundamental Problem because there is no conceptual definition of invalid clicks that can be operationalized.

Page 21 continues on looking at how Google does click fraud detection, covering a range of general preventative measure and more active things done when clicks actually happen.

On page 23, a look at filtering systems begins, ending with this summary that's positive for Google, at the moment. It also stresses that filtering will always come under new challenges:

The current set of Google filters is fairly stable and only requires periodic “tuning” and “maintenance” rather than a radical re-engineering, even when major fraudulent attacks are launched against the Google Network. It also demonstrates that various recent efforts of the Click Quality team to improve performance of their filters produce only incremental improvements. Thus, the Click Quality team currently reached a stability point since additional efforts to enhance filters produce only marginal improvements.

Having said this, the Click Quality team also realizes that this is only a local stability point in the sense that major future modifications in clicking patterns of online users and new types of fraudulent attacks against Google can lead to radically new types of invalid clicks that the current set of filters can miss. Therefore, the Click Quality team is working on the next generation of more powerful filters that will monitor a broader set of signals and more complex monitoring conditions. These new filters will require a more powerful computing infrastructure than is currently available, and the Click Quality team also participates in developing this infrastructure. Their overall goal is to make click spam hard and unrewarding for the unethical users thus making it uneconomical for them and turning many of them away from Google and the Google Network.

At page 28, the expert notes that Google's filters are relatively simple in nature, yet they work:

The structure of most of Google’s filters, with a few exceptions, is surprisingly simple. I was initially puzzled and thought that Google did not do a reasonable job in developing better and more sophisticated filters. I was initially certain that these simple filters should miss many types of more complicated attacks. However, the evidence reported in the previous two sections indicates that these simple filters perform reasonably well.

Why? A variety of reasons, such unsophisticated attacks:

Although some of the coordinated attacks can be quite sophisticated, the majority of the invalid clicks usually come from relatively simple sources and less experienced perpetrators....Still, there are certain types of attacks that Google filters will miss; but these attacks should be quite sophisticated and would require significant ingenuity to launch. Therefore, there cannot be too many of these, unless perpetrators become much more imaginative....

The Long Tail / Search Tail even gets a mention, with the idea being that -- if I understand correctly -- most activity focuses around the same type of things that the filters work well to detect. IE, the filters do well at cutting off the head of click fraud -- and if tail activity gets through, it's relatively little in comparison:

Despite its current reasonable performance, this situation may change significantly in the future if new attacks will shift towards the Long Tail of the Zipf distribution by becoming more sophisticated and diverse.

At the bottom of page 29, the report starts examining whether Google is letting stuff slide to earn more money:

Since Google does not charge advertisers for invalid clicks, this means that it loses money by filtering out these clicks. Thus, there is a financial incentive for Google not to forgo some of these revenues and simply be “easy” Long Tail Left Part Frequency Rank 30 on filtering out invalid clicks. Therefore, it is important to know if any business considerations entered into the filter specification process or is it entirely determined by Google’s engineers in an objective manner with a single purpose to protect the advertiser base. This is one of the important issues that I investigated as a part of my studies of how Google manages detection of invalid clicks....

The conclusion is that Google isn't trying to favor itself:

I have spent a significant amount of time trying to understand who sets these threshold parameters, how, and what are the procedures and processes for setting them. In particular, I tried to understand if it is an entirely engineering decision that tries to protect the advertisers from invalid clicks or any of the business groups at Google are involved in this decision process with the purpose of influencing it towards generating extra revenues for Google.

As a result of these investigations, I realized that it constitutes exclusively an engineering decision with no inputs from the finance department or the business units, except the following two cases:

  • The first one was a special case when one particular IP address was disabled because of inappropriate clicking activities, and a business unit requested the Click Quality team to conduct an additional investigation since it was an important customer associated with that IP address, and restore it if the investigation results were negative. When I was explained what had happened, I felt that Google’s actions were reasonable in this particular situation.  
  • The change in the doubleclick policy that was considered in Winter 2005 and implemented in March 2005. It turned out that the change in the doubleclick policy (i.e., not to charge advertisers for the immediate second click in a doubleclick) had non-trivial financial implications for Google. Being a publicly traded company at that time, this change would have had a noticeable effect on Google’s total revenues with corresponding implications for the financial performance of the company. Therefore, this policy change had legitimate concerns for Google’s management, and these financial implications have been discussed in the company. Still, despite its noticeable negative effects on its financial performance, Google decided to abandon the old doubleclick policy and not to charge advertisers for the second click, which was an appropriate action to take.

In conclusion, with the exception of the doubleclick, I found Google’s processes for specifying filters and setting parameters in these filters driven exclusively by the consideration to protect the advertiser base, and, therefore, being reasonable.

Doubleclick constitutes a special case. For me, the second click in the doubleclick is invalid, as I argued in Section 8, and the advertisers should not be charged for it. It is not clear to me why it took Google so long to revise the policy of charging for doubleclicks. Nevertheless, this policy was revised in March 2005 despite the fact that the company lost “noticeable” revenues by taking this action.

I find the conclusion that Google wasn't trying to benefit itself doesn't mesh well with the expert's own concern/confusion/uncertainty about why Google took so long to change its policy on doubleclicks. Moreover, that entire policy isn't well explained. Way back up on page 20, there's this very brief mention:

It turns out that Google had a history associated with the definition of a doubleclick: at some point doubleclick was considered to be a valid click and advertisers were charged for it, while subsequently Google reconsidered and treated doubleclick as invalid.

And that's it until the section later in the report, where Google's effectively accused of footdragging on changing its policy, where business discussions about the change were made, but Google then seems to be given the all clear because eventually it did the right thing.

The entire matter is something that feels like it should have been explored more, but page 31 sheds light as to why this might have been difficult. Google's apparently had a complete staff change in relation to click fraud detection since it began charging by the click:

In this subsection, I will describe the history of development of Google filters. First of all, I would like to point out that most of the descriptions in this subsection are not based on documents provided to me by Google but rather on the verbal descriptions by the members of the Click Quality team based on their recollections of the past events and on the “folklore” evidence since none of the team members I interviewed were even around or involved in the click fraud effort when the AdWords program was introduced in February 2002.

The section continues with detection divided into these groupings -- and I've bolded a key part:

  • The Early Days (February 2002 – Summer 2003). These were the early days of the PPC model and of the click fraud characterized by extensive learning about the problem and determining ways to deal with it.  
  • The Formation Stage (Summer 2003 – Fall 2005). This stage started with the introduction of the AdSense program in March 2003, formation of the Google Click Quality team in the Spring/Summer 2003, launch of new filters and the intent to take the invalid click detection efforts to the “next level.” It ended with the development of the whole infrastructure for combating invalid clicks and the consolidation of Google’s invalid click detection efforts. This stage was characterized by significant progress in combating invalid clicking activities and developing mature systems and processes for accomplishing this task. Although the Click Quality team’s solutions were still not perfect, based on the information provided to me by Google, I reached the conclusion that the invalid clicking problem at Google was “under control” by the end of 2005.  
  • The Consolidation Stage (Fall 2005 – present). By this time, Google had enough filters and perfected them to the level when they would detect most of the invalid clicking activities in the Left Part of the Zipf distribution (see Figure 1) and some of the attacks in the Long Tail. They would still miss more sophisticated attacks 32 in the Long Tail, and the Click Quality team continued working on the neverending process of improving their filters to detect and prevent new attacks. The Click Quality team has also been working on enhancing their infrastructure and improving their processes....

What? Click fraud wasn't under control until the end of 2005, yet Google is said to have acted reasonably by the report? How does this make sense? The best explanation seems to be that as the report goes on, the author feels click fraud was an evolving problem, and that Google was reasonably reacting to prevent it even though it wasn't "under control" until the end of last year. In contrast, had Google been doing nothing, then it might have been deemed not to have been taking reasonable steps to gain control.

Page 32 looks at the early days and notes that for a year and a half, no new filters were added other than the three original ones that CPC-based AdWords started with. Why? Maybe click fraud was less understood at that time since it was so new (though Search Engine Watch was citing articles on the problem like this one from Wired as far back as 2001). That's one suggestion, along with Google having fewer resources, lacking the right infrastructure or click fraud being on a smaller scale. But these are all guesses, since as the author notes (again, I've bolded a key part):

Not a single person on the Click Quality team was either around or involved in the click fraud detection back in 2002. The only person from this era who is still at Google is on an extended leave and was not available for comments during my visits to Google.

It is hard to judge reasonableness of Google’s invalid click detection efforts between 2002 and summer 2003 because there is simply not enough information available for this time period for me to form an informed judgment about this matter. One exception is the doubleclick policy that I have described before. As I have already stated, the second click in the doubleclick is invalid in my opinion, and Google should have identified it as such well before March 2005 (however, the detection and filtering out the third, fourth and other subsequent clicks was there since the introduction of the PPC model, and advertisers were not charged for these extra clicks).

Again, I get confused by the report declaring that Google operated reasonably when it also states that it can't judge if it indeed acted reasonably for part of the claim period.

The middle period finds progress with far more confidence, as covered on page 33:

The Formation Stage (Summer 2003 – Fall 2005). This stage started with the introduction of the AdSense program in March 2003 and the formation of the Google Click Quality team in the Spring/Summer 2003 (the first person was hired in April 2003 with the mandate to form the Click Quality team; several people joined the team during the summer of 2003, and the initial “core” team consisting of Operations and Engineering groups was consolidated by Fall 2003).

During this time period, two new filters were introduced in Summer 2003 and one more in January 2004. These three new filters remedied several problems that existed since the launch of the first three filters and significantly advanced Google’s invalid click detection efforts. Besides the development of new and better filters, there was a separate effort launched to develop the whole infrastructure for doing the offline analysis of invalid clicks and managing customer inquiries about invalid clicks and billing charges.

Despite all these efforts, the new filters and the offline analysis methods still failed to detect some of the more sophisticated attacks (presumably from the Long Tail of the Figure 1) launched against the Google Network in 2004 and the first half of 2005. In response to these activities and as a part of the overall invalid click detection effort, Google engineers introduced some additional filters around Winter and Spring 2005, including the filter identifying the second immediate click in a doubleclick as invalid.

As a result of all of these efforts by the Click Quality team, a significant progress has been made in combating invalid clicking activities and developing mature systems and processes to accomplish this task. Although the Click Quality team’s solutions were still not perfect, based on the information provided to me by Google, I reached the conclusion that the invalid clicking problem at Google was “under control” by the end of 2005.

And overall filtering is given this conclusion at the top of page 35:

Google put much effort in developing infrastructure, methods and processes for detecting invalid clicks since the Click Quality team was established in 2003. These efforts were not perfect since Google missed certain amounts of invalid clicks over these years and it adhered to the doubleclicking policy for too long in my opinion. However, click fraud is a very difficult problem to solve, Google put a significant effort to solve it, and I find their efforts to filter out invalid clicks as being reasonable, especially after the doubleclick policy was reversed in March 2005.

Page 35 then begins looking at "offline" or non-automated ways to find click fraud that's gotten past filters. By page 37, it gets into systems applied to review what happens on some AdSense sites:

Auto-Termination System is an automated offline system for detecting the AdSense publishers who are engaged in inappropriate behavior violating the Terms and Conditions of the AdSense program. It examines online behavior of various publishers and either immediately terminates or warns the publishers who are engaged in the activities that the system finds to be inappropriate.

Interestingly, the system is still relatively new, only about a year old, as explained on page 38:

The first prototype of the auto-termination system was built in the early 2005 and the system was launched in the summer 2005. Recently, Google has developed major enhancements to the current version of the auto-termination system deploying an alternative set of technologies.

Page 38 also starts a look at the manual review that the click fraud team does, with this positive summary coming on page 40:

I have personally observed several such inspections and can attest to how successfully they have been conducted by Google’s investigators. This success can be attributed to (a) the quality of the inspection tools, (b) the extensive experience and high levels of professionalism of the Click Quality inspectors, and (c) the existence of certain investigation processes, guidelines and procedures assisting the investigators in the inspection process.

However, using humans also poses a bottleneck, as covered on page 41:

My only concern with these manual inspections is about scalability of the inspection process. Since the number of inquiries grows rapidly, so does the number of inspections required to investigate these inquiries. As stated before, Google tries to automate this process by letting software systems do a sizable number of inspections. Still, the number of manual inspections keeps growing significantly over time, based on the numbers that I have seen. This means that Google has a challenging task of expanding and properly training its team of inspectors to assure rapid high-quality inspections of inquiries in the future.

Page 41 also revisits the tug-of-war between advertisers wanting more transparency and Google trying to protect against click fraud by giving too much information away:

One of the complaints about Google’s investigation system that I keep hearing is that Google is quite secretive and does not provide meaningful explanations of the inspection results neither to the advertisers nor to the publishers. After examining how their inspection systems work, I can understand this secrecy. If Google provides such explanations, then the unethical users can gain additional insights into how Google invalid click detection methods work and would be able to “game” their detection methods much better, thus creating a possibility of massive click fraud. To avoid these problems, Google prefers to be secretive rather than to risk compromising their detection systems and the advertiser base.

And this interesting tidbit on how when someone gets kicked out of AdSense, advertisers apparently get refunds:

Finally, I would like to point out that when Google terminates an AdSense publisher, all the clicks generated at that publisher’s site over a certain time period (valid and invalid) are credited to the advertisers whose ads were clicked on that site....

How well are things going? That begins to be addressed at the bottom of page 41, and here's a key statement from page 42:

The number of inquiries about invalid clicks for the Click Quality team increased drastically since late 2004. However, the number of refunds for invalid clicks provided by Google did not change significantly over the same time period. Therefore, the number of refunds per inquiry decreased drastically since late 2004. Since each inquiry about invalid clicks leads to an investigation, this means that significantly fewer investigations result in refunds. This statistic can be interpreted in several ways. First, it can be an indication that Google’s invalid click detection methods have significantly improved over this time period and that reactive investigations do not find any problems when searching for invalid clicks. Second, this statistic can mean that Google tightened its refund policies and is less generous with its refunds than it used to be. Third, this statistic can mean that more advertisers are looking more carefully into their logs and are more suspicious about invalid clicks since this problem received wide attention in the media and the public discourse in general. Therefore, they may request Google to investigate suspicious clicking activities even if nothing really happened. I examined investigative activities of the Google Click Quality team and can attest that it consists of a group of highly professional employees who do their investigations carefully and professionally. Therefore, I do not believe in the second reason stated above. The third reason is quite possible since advertisers are indeed concerned about invalid clicks and request Google to investigate suspicious clicking activities more frequently than before. However, the number of inquiries increased so significantly that I would expect that the number of refunds would also increase somewhat. Since this did not happen, I attribute this effect to the fact that Google’s invalid click detection methods work reasonably well by now.

I've bolded the most important parts to me. The expert is saying that more advertisers are raising inquiries, probably because of increased concerns (which we know is the case from various surveys over the past two years) but that Google isn't refunding more. Nor is that Google just protecting itself, the expert says. To him, it's a case that the concerns aren't matching the reality. Click fraud -- bad clicks getting past Google -- do not appear to be on the rise.

Nor is click fraud getting past filters a major problem compared to the amount Google is proactively catching, the expert says:

The total amount of reactive refunds that Google provides to advertisers as a result of their inquiries is miniscule in comparison to the potential revenues that Google foregoes due to the removal of invalid clicks (and not charging advertisers for them).

Another interesting part is how Google is comparing traffic across its network to that from within Google.com, which is said to be a "gold standard" of a pure site. The network is said to compare well:

Another indirect piece of evidence provided to me by Google is that Conversions-Per- Dollar (CPD) rates on various partner sites of Google Network are not significantly lower than on their “flagship” Google.com site. CPD is the statistic determining the number of conversions that occurred divided by the dollar amount spent on advertising. This statistic shows how effective advertising campaigns are for the advertisers. Since Google spent much effort over the past 4.5 years to make sure that Google’s AdWords program works reasonably well, it now serves as the “golden standard” against which other programs are compared at Google. Since CPD numbers for other parts of the Google Network approach that of at Google.com, this is an indication that other advertising programs work as well as AdWords works on Google.com. Since other parts of the Google Network are affected by invalid clicking activities significantly more than Google.com, this is an indication to the Click Quality team that their efforts to combat fraud on other parts of the Google Network are as effective as on Google.com.

At the bottom of page 43 is an overall conclusion about that Google's doing a reasonable job with detection, as best as this scientist can tell. It also takes some slams at general reports of click fraud being widespread in the press as not being proven true or false yet. I've bolded the key paragraph for all this below:

As a scientist, I am accustomed to seeing more direct, objective and conclusive evidence that certain methods and approaches “work.” Having said this, I fully understand the difficulties of obtaining such measures for invalid clicks by Google, as previously discussed in this report. Moreover, one can challenge most of the reports pertaining to invalid clicking rates published in the business press by questioning their methodologies and assumptions used for calculating these rates. Most of these reports would not stand hard scientific scrutiny.

Still, as a scientist, it is hard for me to arrive at any definitive conclusions beyond any reasonable doubt based on Points (1) – (6) above that Google’s invalid click detection methods “work well” and remove “most” of the invalid clicks – the provided evidence is simply not hard enough for me, and I am used to dealing with much more conclusive evidence in my scientific work.

Having said this, the indirect evidence (1) – (6) specified above, nevertheless, provides a sufficient degree of comfort for me to conclude that these filters work reasonably well. Finally, this statement should not be interpreted as if I find Google’s effort to detect invalid clicks (a) unreasonable, or (b) not working reasonably well. It only states that Google did not provide a compelling amount of conclusive evidence demonstrating the effectiveness of their approach that would satisfy me as a scientist.

Finally, the measures (1) – (6) above are only statistical measures providing some evidence that Google’s filters work reasonably well. This does not mean, however, that any particular advertiser cannot be hurt badly by fraudulent attacks, given the evidence that Google filters “work.” Since Google has a very large number of advertisers, one particular bad incident will be lost in the overall statistics. Good performance measures indicative that filters work well only mean that there will be “relatively few” such bad cases. Therefore, any reports published in the business press about particular advertisers being hurt by particular fraudulent attacks do not mean that the phenomenon is widespread. One simply should not generalize such incidents to other cases and draw premature conclusions – we simply do not have evidence for or against this.

Page 44 has a section that restates conclusions in terms of economic aspects -- IE, any economic motivation for Google to hide or ignore click fraud:

First of all, most of the revenue that Google foregoes due to discarding invalid clicks comes from the filters since they identify most of the invalid clicks. The second source of the forgone revenues comes from the terminated AdSense publishers (as stated before, all the clicks made on the terminated publisher’s website generated over a certain time period are credited back to the advertisers regardless of whether they are valid or invalid). However, this second type of revenue is relatively small in comparison to the foregone revenues due to filters. The third source of the foregone revenues comes from the AdWords credits. However, these AdWord credits are miniscule in comparison to the other sources of foregone revenues. In summary, the most significant source of foregone revenues, by far, are Google filters. Hence their performance is the most crucial factor for the whole invalid click detection program (note that this observation does not mean that Google focuses mainly on this part of the invalid click detection program since other parts are also important)....

It makes no business sense for Google to go after these extra revenues and that the best long-term business policy for Google is to protect advertisers against invalid clicks. Policy reversal on the doubleclick is a good example of this. By not charging advertisers for the doubleclick since March 2005, Google lost a “noticeable” amount of revenues. However, the revenues lost as a result of this action are insignificant in comparison to the revenues that Google risks to lose if it loses trust of the advertisers. Therefore, reversing the doubleclick policy makes sense not only from the legal, ethical and public relations point of view, but it is also a sound economic decision.

Finally, the beginning of page 46 gives this overall conclusion:

Google has built the following four “lines of defense” against invalid clicks: pre-filtering, online filtering, automated offline detection and manual offline detection, in that order. Google deploys different detection methods in each of these stages: the rule-based and anomaly-based approaches in the pre-filtering and the filtering stages, the combination of all the three approaches in the automated offline detection stage, and the anomaly-based approach in the offline manual inspection stage. This deployment of different methods in different stages gives Google an opportunity to detect invalid clicks using alternative techniques and thus increases their chances of detecting more invalid clicks in one of these stages, preferably proactively in the early stages.

Since its establishment in the Spring and Summer of 2003 the Click Quality team has been developing an infrastructure for detecting and removing invalid clicks and implementing various methods in the four detection stages described above. Currently, they reached a consolidation phase in their efforts, when their methods work reasonably well, the invalid click detection problem is “under control,” and the Click Quality team is fine-tuning these methods. There is no hard data that can actually prove this statement. However, indirect evidence provided in this report supports this conclusion with a moderate degree of certainty. The Click Quality team also realizes that battling click fraud is an arms race, and it wants to stay “ahead of the curve” and get ready for more advanced forms of click fraud by developing the next generation of online filters.

In summary, I have been asked to evaluate Google’s invalid click detection efforts and to conclude whether these efforts are reasonable or not. Based on my evaluation, I conclude that Google’s efforts to combat click fraud are reasonable.

Posted by Danny Sullivan at 1:58 PM | Permalink

The Abridged Version: Independent Report On Google's Click Fraud Detection Practices

Last Friday, an independent report on how Google deals with click fraud was published as part of the ongoing Lane's Gifts v. Google class action lawsuit over click fraud. To my knowledge, it is the most comprehensive, detailed public look into how Google deals with click fraud that's ever come out. It finds that Google's efforts to combat the issue have been reasonable, though there are some eyebrow raising bits on how the author only finds the situation was in control by the end of 2005 and how it's impossible to fully know whether some clicks are invalid -- and thus, potentially -- impossible to prevent some types of fraud through purely automated means.

The report is long, a 47 page PDF file. Anyone interested in click fraud issues should give it a thorough read. But given how everyone's always busy, I thought I'd highlight below a number of sections that stood out in my review of the document.

The report is by Dr. Alexander Tuzhilin, Professor of Information Systems at New York University. To prepare it, he says in the Executive Summary at the beginning (page 1):

I have been asked to evaluate Google’s invalid click detection efforts and to conclude whether these efforts are reasonable or not. As a part of this evaluation, I have visited Google’s campus three times, examined various internal documents, interviewed several Google’s employees, have seen different demos of their invalid click inspection system, and examined internal reports and charts showing various aspects of performance of Google’s invalid click detection system. Based on all these studied materials and the information narrated to me by Google’s employees, I conclude that Google’s efforts to combat click fraud are reasonable. In the rest of this report, I elaborate on this point.

Immediately, the first thing that comes to mind is that he makes no mention of talking with individual advertisers, which could lead you to think that if he's only talking with Google, of course he's likely to come away with the idea that Google is doing everything just fine.

When you read the report, it's clear this isn't the case. Google does come under criticism. It's also important to realize Tuzhilin was not employed by Google to create this report. He's an independent expert appointed to my knowledge by the court. Exactly how he was selected is unclear, and I do think it would be a better report if advertiser data had been involved. But there's still plenty of good stuff here to digest.

Page 2 covers his background and materials reviewed from Google to prepare the report.

Page 3 and some of page 4 covers those he talked with at Google. Interesting details are that Google's click quality team consists of about 36 people, one-third engineers looking to design detection systems and the remaining two-thirds dedicated to doing manual investigations of suspected fraud.

Pages 4 through 6 cover the history of the internet, search engines and Google, most of which isn't that necessary for most experienced search marketers. Page 7 talks about three main ways of purchasing advertising:

  • CPM - cost per impression
  • CPC - cost per click
  • CPA - cost per action

Again, basic stuff. But it's worth touching on because of some of the current debate that Google and other search engines will be forced to go to CPA pricing to fully eliminate fraud.

On page 8, Tuzhilin lends some support of this, or at least the problems that others have raised with CPC:

Although currently popular, the CPC/PPC model has two fundamental problems:

  • Although correlated, good click-through rates (CTRs) are still not indicative of good conversion rates, since it is still not clear if a visitor would buy an advertised product once he or she clicked on the ad. In this respect, the CPA-based models provide better solutions for the advertisers (but not necessarily for the search engines), since they are more indicative that their ads are “working.”  
  • It does not offer any “built-in” fundamental protection mechanisms against the click fraud since it is very hard to specify which clicks are valid vs. invalid in general, as will be explained in Section 8 (it can be done relatively easily in some special cases, but not in general). For this reason, major search engines launched extensive invalid click detection programs and still face problems combating click fraud.

In response to these two problems and for various other business reasons, Google is currently testing a CPA payment model, according to some reports in the media. Some analysts believe that the conversion-based CPA model is more robust for the advertisers and also less prone to click fraud. Therefore, they believe that the future of the online advertising payments lies with the CPA model. Although this is only a belief that is not supported by strong evidence yet, Google is getting ready for the next stage of the online advertising “marathon.”

What Will Replace Pay-Per-Click Advertising? over at Publishing 2.0 from Scott Karp is a good roundup and debate on some of the issues of CPA perhaps as the solution to CPC issues.

I've posted lots of comments in Karp's post, but my personal view is this. Currently, Google is offering all three major payment systems: CPC, CPM and CPA. It is offering all three not just because of fraud issues but because advertisers have different goals with advertising, where different payment models may be required.

Building brand? You want impressions perhaps more than clickthrough, and suddenly CPM makes sense. Really savvy with conversion tracking? CPA might make more sense for you, as a way for you to feel less likely to be exposed to fraud and more likely to really be paying only for key traffic. Fairly rudimentary with conversion tracking? Doing low-cost CPC ads might make a lot of sense, for your situation. And beyond the three big ones, I'm sure we'll see other options emerge. The unifying goal around all of them, from Google's perspective, will be figuring out a way to help advertisers track that the ads are working according to some type of metrics that the advertisers want.

Skipping down past background on how AdWords works and the AdSense program (AdSense For Domains doesn't get mentioned, though it's a major program), page 13 starts in on what Google can tell about clicking activities.

Google is apparently making use of conversion data that advertisers provide to determine if fraudulent clicks are happening. My understanding was that conversion data was supposed to be ringfenced and not used by Google for anything, not even in the aggregate. But perhaps the policy has changed or perhaps I misunderstood this. I'll check on that (and also note that confusingly, the report says on page 34 that "None of the filters uses the conversion information that Google collects"). Certainly Google made no such restrictions when it launched Google Checkout. But even with conversion data, the report notes using this info isn't perfect.

Google collects various types of information about querying and clicking activities, including certain types of “post-clicking” data about conversion actions on the advertiser’s website where the visitor is taken following the click. All this data accumulated by Google is extracted from various sources and contains comprehensive information about visitor’s activities on the Google Network.

As stated before, the conversion data – the “post-clicking” data about conversion actions on the advertiser’s website – constitutes an important piece of this collected data. In particular, if the advertiser formally agrees to provide this information, Google collects data on whether or not the user visited certain designated pages on the advertised website that the advertiser marked as “conversion” pages, such as the checkout page and certain form filling pages. This conversion data is limited to what the advertiser decided to provide to Google and is not as rich as the clickstream data collected by advertisers themselves on their websites. Also, many advertisers decide to opt out from providing this conversion data. In this case, Google does not have any conversion information and therefore does not know what happened after a visitor clicked on the ad. Nevertheless, this post-clicking conversion data is important for Google even in its limited form because it conveys some intentions of the visitors on the advertised website and provides good insights into whether or not the visitor is seriously considering purchasing the advertised product or service....

This “raw” clicking data described above is subsequently cleaned, preprocessed and stored in various internal logs by Google for different types of subsequent analysis conducted on this data.

One inherent weakness of Google’s (or any other search engine) data collection effort that is important for detecting invalid clicks, is inability to get full access to all the clicking activities of the visitors of the advertised website. In other words, the conversion data that Google collects provides only a partial picture of all the post-clicking activities of the visitor on the advertised website. This data is important for detecting invalid clicks since better invalid click detection methods can be developed using this data. Unfortunately, Google (and other search engines) does not have full access to this data, unless the advertised website decides to provide its clickstream data to Google, which many websites are reluctant to do. However, this is not Google’s fault – this is an inherent limitation of the types of data available to Google.

While it might not be perfect, the report also notes at the end of this section that no one has the perfect collection of information:

However, this lack of full conversion data available to Google is compensated by various types of querying and clicking data that Google can collect, whereas advertisers and third-party vendors cannot. Therefore, there exists a tradeoff between the types of data relevant for detecting invalid clicks that is available to Google, advertisers and the thirdparty vendors. None of these three groups have the most comprehensive set of data pertinent to detecting invalid clicks, and each of them needs to settle for the invalid click detection methods possible only with the data that they have.

On page 14, the report addresses the frustration advertisers feel over the relatively non-granular nature of Google's reporting versus Google's need to keep some things carefully protected:

The smallest unit of analysis is one day. For example, the number of invalid clicks on an ad detected by Google (or any other related statistic) can only be reported on a daily basis (although there are certain alternative methods of obtaining aggregation granularity that is smaller than a day). In other words, advertisers cannot know if a particular click on a particular ad was marked as valid or invalid by Google, and Google refuses to provide this information to advertisers.

This is a source of contention and dispute between Google and the advertisers, and one can understand both parties in this dispute. On one hand, the advertiser has the right to know why a particular click was marked as valid by Google (when the advertiser thinks that it is invalid) because the advertiser pays for this click. On the other hand, if Google discloses this information, it opens itself to click fraud on a massive scale because, by doing so, it provides certain hints about how its invalid click detection methods work. This means that unethical users will immediately take advantage of this information to conduct more sophisticated fraudulent activities undetectable by Google’s methods.

This conflicting dilemma between advertisers’ right to know and Google’s inability to provide the appropriate information to advertisers because of the security concerns is part of the Fundamental Problem of the PPC advertising model to be discussed in the next section. More recently, Google tried to bridge this gap between Google and the advertisers.

Page 15 spends time looking at various definitions of click fraud, bringing us to page 16 which raises the bigger issue that it is impossible to know the intent of ALL clicks, which is crucial to understand what chunk of them might be fraudulent:

Unfortunately, in several cases it is hard or even impossible to determine the true intent of a click using any technological means. For example, a person might have clicked on an ad, looked at it, went somewhere else but then decided to have another look at the ad shortly thereafter to make sure that he/she got all the necessary information from the ad. Is this second click invalid? To make things even more complicated, the second click may not be strictly necessary since the person remembers the content of the ad reasonably well (hence there is no real need for the second click). However, the person may not really like or care about the advertiser and decides to make this second click anyway (to make sure that he/she did not miss anything in the ad and his/her information is indeed correct) without any concerns that the advertiser may end up paying for this second click (since the person really does not care about the advertiser and his/her own interests of not missing anything in the ad overweigh the concerns of hurting the advertiser). Therefore, in some cases the true intent of a click can be identified only after examining deep psychological processes, subtle nuances of human behavior and other considerations in the mind of the clicking person.

Soon after this, on page 17, comes the first real bombshell to me. As said above, you can't detect the intent of all clicks. Given this, there's no reasonable way to be certain that technological fixes for click fraud detection are working:

In summary, between the obviously clear cases of valid and invalid clicks, lies the whole spectrum of highly complicated cases when the clicking intent is far from clear and depends on a whole range of complicated factors, including the parameter values of the click. Therefore, this intent (and thus the validity of a click based on the above definitions) cannot be operationalized and detected by technological means with any reasonable measure of certainty.

What? Didn't the report find Google was acting reasonably? Yes, and I think this is is because as the report goes on, it's because Google's not relying solely on automated means to stop click fraud, which might allow some clicks to get through, if that were only the case.

Page 18 picks of the issue even more strongly, and I've bolded this section because it deserves special attention. Note that the italics were originally included:

The last statement has one important implication: given a particular click in a log file, it is impossible to say with certainty if this click is valid or not in all the cases. This means that

  • It is impossible to measure the true rates of invalid clicking activities, and all the reports published in the business press are only guesstimates at best.  
  • The invalid click detection methods need to be developed without a proper operationalizable conceptual definition of invalid clicks.

The important word above is all the cases since in some cases it can be stated with certainty if a particular click is valid or not. For example, it is easy to detect a doubleclick using relatively simple technological means, assuming that the doubleclick is invalid.

Again, it seems to be a case that automation can catch some, perhaps lots of click fraud, but it can't catch all of it because the intent problem. Also crucial in the above is the stressing that rates we've been given from various sources are simply guesses, since the intent of clicks aren't know to some of these other sources.

Indeed, in the case of the recent Outsell report, you don't even have to worry about figuring out the intent of particular clicks. Click fraud stats from that report come from half the panel entirely guessing about what click fraud rates they might have -- guessing, because that half does not auditing of clicks at all.

Page 19 deals with ways of identifying invalid clicks, at least according to operational approaches -- IE, automated criteria. Do the clicks show some type of:

  1. Anomaly from past clicking patterns for a site or ad?
  2. Violate certain predefined rules?
  3. Fall into certain classes of behavior that make them deemed invalid?

Page 20 explains that Google primarily depends on the first two approaches -- looking for anomalies and using rules -- but then gets into what it stresses as the "Fundamental Problem" of fraudulent clicks:

We conclude that there is a fundamental problem associated with the definition of invalid clicks for the Pay-per-Click model. This problem can be summarized as follows:

  • There is no conceptual definition of invalid clicks that can be operationalized in the sense defined above.  
  • An operational definition cannot be fully disclosed to the general public because of the concerns that unethical users will take advantage of it, which may lead to a massive click fraud. However, if it is not disclosed, advertisers cannot verify or even dispute why they have been charged for certain clicks.

This problem lies at the heart of the click fraud debate and constitutes the main problem of the CPC model: it is inherently vulnerable to click fraud.

Page 21 poses solutions to the problem:

  • The “trust us” approach of the search engines. The search engines can assure advertisers that they are doing everything possible to protect them against the click fraud. This is not easy because of the inherent conflict of interest between the two parties: the money from invalid clicks directly contribute to the bottom lines of the search engines. Nevertheless, it may be possible for the search engines to solve this trust problem by developing lasting relationships with the advertisers. However, the discussion of how this can be done lies outside of the scope of this report.  
  • Third-party auditors. Independent third-party vendors, who have no financial conflicts of interest, can work with advertisers and audit their clickstream files to detect invalid clicks.

These two approaches would still constitute only a partial solution to the Fundamental Problem because there is no conceptual definition of invalid clicks that can be operationalized.

Page 21 continues on looking at how Google does click fraud detection, covering a range of general preventative measure and more active things done when clicks actually happen.

On page 23, a look at filtering systems begins, ending with this summary that's positive for Google, at the moment. It also stresses that filtering will always come under new challenges:

The current set of Google filters is fairly stable and only requires periodic “tuning” and “maintenance” rather than a radical re-engineering, even when major fraudulent attacks are launched against the Google Network. It also demonstrates that various recent efforts of the Click Quality team to improve performance of their filters produce only incremental improvements. Thus, the Click Quality team currently reached a stability point since additional efforts to enhance filters produce only marginal improvements.

Having said this, the Click Quality team also realizes that this is only a local stability point in the sense that major future modifications in clicking patterns of online users and new types of fraudulent attacks against Google can lead to radically new types of invalid clicks that the current set of filters can miss. Therefore, the Click Quality team is working on the next generation of more powerful filters that will monitor a broader set of signals and more complex monitoring conditions. These new filters will require a more powerful computing infrastructure than is currently available, and the Click Quality team also participates in developing this infrastructure. Their overall goal is to make click spam hard and unrewarding for the unethical users thus making it uneconomical for them and turning many of them away from Google and the Google Network.

At page 28, the expert notes that Google's filters are relatively simple in nature, yet they work:

The structure of most of Google’s filters, with a few exceptions, is surprisingly simple. I was initially puzzled and thought that Google did not do a reasonable job in developing better and more sophisticated filters. I was initially certain that these simple filters should miss many types of more complicated attacks. However, the evidence reported in the previous two sections indicates that these simple filters perform reasonably well.

Why? A variety of reasons, such unsophisticated attacks:

Although some of the coordinated attacks can be quite sophisticated, the majority of the invalid clicks usually come from relatively simple sources and less experienced perpetrators....Still, there are certain types of attacks that Google filters will miss; but these attacks should be quite sophisticated and would require significant ingenuity to launch. Therefore, there cannot be too many of these, unless perpetrators become much more imaginative....

The Long Tail / Search Tail even gets a mention, with the idea being that -- if I understand correctly -- most activity focuses around the same type of things that the filters work well to detect. IE, the filters do well at cutting off the head of click fraud -- and if tail activity gets through, it's relatively little in comparison:

Despite its current reasonable performance, this situation may change significantly in the future if new attacks will shift towards the Long Tail of the Zipf distribution by becoming more sophisticated and diverse.

At the bottom of page 29, the report starts examining whether Google is letting stuff slide to earn more money:

Since Google does not charge advertisers for invalid clicks, this means that it loses money by filtering out these clicks. Thus, there is a financial incentive for Google not to forgo some of these revenues and simply be “easy” Long Tail Left Part Frequency Rank 30 on filtering out invalid clicks. Therefore, it is important to know if any business considerations entered into the filter specification process or is it entirely determined by Google’s engineers in an objective manner with a single purpose to protect the advertiser base. This is one of the important issues that I investigated as a part of my studies of how Google manages detection of invalid clicks....

The conclusion is that Google isn't trying to favor itself:

I have spent a significant amount of time trying to understand who sets these threshold parameters, how, and what are the procedures and processes for setting them. In particular, I tried to understand if it is an entirely engineering decision that tries to protect the advertisers from invalid clicks or any of the business groups at Google are involved in this decision process with the purpose of influencing it towards generating extra revenues for Google.

As a result of these investigations, I realized that it constitutes exclusively an engineering decision with no inputs from the finance department or the business units, except the following two cases:

  • The first one was a special case when one particular IP address was disabled because of inappropriate clicking activities, and a business unit requested the Click Quality team to conduct an additional investigation since it was an important customer associated with that IP address, and restore it if the investigation results were negative. When I was explained what had happened, I felt that Google’s actions were reasonable in this particular situation.  
  • The change in the doubleclick policy that was considered in Winter 2005 and implemented in March 2005. It turned out that the change in the doubleclick policy (i.e., not to charge advertisers for the immediate second click in a doubleclick) had non-trivial financial implications for Google. Being a publicly traded company at that time, this change would have had a noticeable effect on Google’s total revenues with corresponding implications for the financial performance of the company. Therefore, this policy change had legitimate concerns for Google’s management, and these financial implications have been discussed in the company. Still, despite its noticeable negative effects on its financial performance, Google decided to abandon the old doubleclick policy and not to charge advertisers for the second click, which was an appropriate action to take.

In conclusion, with the exception of the doubleclick, I found Google’s processes for specifying filters and setting parameters in these filters driven exclusively by the consideration to protect the advertiser base, and, therefore, being reasonable.

Doubleclick constitutes a special case. For me, the second click in the doubleclick is invalid, as I argued in Section 8, and the advertisers should not be charged for it. It is not clear to me why it took Google so long to revise the policy of charging for doubleclicks. Nevertheless, this policy was revised in March 2005 despite the fact that the company lost “noticeable” revenues by taking this action.

I find the conclusion that Google wasn't trying to benefit itself doesn't mesh well with the expert's own concern/confusion/uncertainty about why Google took so long to change its policy on doubleclicks. Moreover, that entire policy isn't well explained. Way back up on page 20, there's this very brief mention:

It turns out that Google had a history associated with the definition of a doubleclick: at some point doubleclick was considered to be a valid click and advertisers were charged for it, while subsequently Google reconsidered and treated doubleclick as invalid.