March 5, 2008

Google: The Spy Who Loved Me

Dr. Hal Varian, Google's chief economist and occasional Freakonomics Blog guest blogger, posted "Why data matters" on the official Google blog, cross-posted on the Google Public Policy Blog.

Varian explains that Web search algorithms are improved by the "wisdom of the crowds" drawn from the "logs of billions of previous search queries." That makes the general public - and government officials - nervous about privacy.

Varian tutors us in PageRank simplified and discusses link building in an ideal world - one where The New York Times and The Wall St. Journal, for example, would link to other sites generously:

"If I have six links pointing to me from sites such as the Wall Street Journal, New York Times, and the House of Representatives, that carries more weight than 20 links from my old college buddies who happen to have web pages."

The House of Representatives? Sounds more like Charlie Wilson's War.

SEOs, contact your local Congressional Representative for paid links - paid for with your hard-earned tax dollars.

The reality: when Dr. Varian was interviewed, The New York Times Freakonomics Blog linked to Google.org, Google green energy, Dr. Varian's position auction paper (pdf); BBC News on Moore's Law; Paul Seabright (Professor of Economics, University of Toulouse, France); Dr. Varian's NY Times energy article; another Freakonomics blog post; WebMD, Revolution Health, and Paul Anderson, Professor of Security Engineering, University of Cambridge.

That's the way major media outlets and journalists typically link: to each other; to corporate sites; to universities. It's an elite, exclusive club. Nick Carr's "digital elite."

That isn't to say Dr. Varian can't tell a good story. He reveals how Larry and Sergey trying to license their PageRank algorithm to "some of the newly formed web search engines."

No names named. None of the nascent search engines were interested. Since they couldn't sell their algorithm, Brin and Page decided to start a search engine themselves. (Note to VCs: Don't try this business model at home.)

Google has since added more than 200 additional "signals" to the algorithms that determine the relevance of websites to a user's query. We are the signals.

All the background info leads to one conclusion: Google needs your data. Google wants you to take a leap of faith. Google must store and analyze search logs. They want us to believe, "Nobody does it better."

Reminds me of Radiohead via Carly Simon:

"But like heaven above me, the spy who loved me/Is keeping all my secrets safe tonight. And nobody does it better/Sometimes I wish someone would/Nobody does it quite the way you do/Why'd you have to be so good."

Dr. Varian suggests readers "Watch our videos to see exactly what data we store in our logs."

Not everyone has time - or the inclination - to watch Google videos on YouTube.

What worries me: Google doesn't understand us any better than we understand the mathematical formulas of search engine algorithms.

Search Engine WarGames won't be fought between humans and machines.

Nick Carr put it best: "The erosion of the middle class may well accelerate, as the divide widens between a relatively small group of extraordinarily wealthy people - the digital elite - and a very large set of people who face eroding fortunes and a persistent struggle to make ends meet. In the YouTube economy, everyone is free to play, but only a few reap the rewards."

Posted by Kevin Heisler at 12:20 AM | Permalink

December 27, 2007

Google Misses the Mark with Reader Shared Items

This might make the folks at Facebook feel better about the whole Beacon privacy fiasco. It appears that even Google can make a mistake, as they did this month when they made shared items in Google Reader accessible to all Google Talk friends. Without asking. And without an easy way to opt out, short of deleting contacts or not sharing anything.

I don't know if I'd go so far as some, who claim that the move by Google ruined Christmas, but it was an unnecessarily foolish move by Google, which could have been avoided by making the sharing an opt-in decision, instead of an opt-out one.

This week (being a slow news week and all), many bloggers took offense to the move. Some complained that Google is invading their privacy by sharing items with people who they didn't intend to share with. Others blame users for not understanding what "shared" means.

Last night, the product team responded on the Google Reader blog with a response to the "helpful feedback" it received from bloggers. The sharing feature is still automatic and opt-out, but now users can quickly create a new tag for all shared items and then decide which contacts to share those items with.

And a link is presented at sign-in to a page that explains the process in the Reader Help Center:"If for any reason you'd like to start your sharing afresh, you can always remove all your previously shared items. Just go to the Friends Settings and click Move or Clear Shared Items. You will be given an option to select or create a tag and move your shared items to that tag, or clear your shared items. The items will remain in their original feeds along with any tags you've given them, but will no longer be in your shared items feed."

Posted by Kevin Newcomb at 5:28 AM | Permalink

September 28, 2007

Google Hack Gets At Personal Data

Philipp Lenssen has discovered a hack to Google's XSS that allows access to personal data, according to Blogoscoped today.

The tests he used with co-editor Tony Ruscoe show that is possible to get access to subject line information and first few words of emails from Gmail, statistical information from Google Analytics, as well as see what Google Gadgets are being used.

The glitch is specific to Explorer, the pair reported, and uses a cross site scripting attack.

The post comes with detailed pics of what is happening. Well worth the read.

Posted by aussiewebmaster at 1:18 PM | Permalink

August 15, 2007

SEW Experts: Google vs. the World

In today's Searching for Meaning column, "Google vs. the World ," Kevin Ryan is here to tell you that privacy is dead and your future lies in everyone else's hands.

Posted by Kevin Newcomb at 12:00 AM | Permalink

June 19, 2007

Google Sweet Google

Google Maps didn't photograph my cats, although my living room window is clearly visible in their shot of my building.

Rather, they immortalized me (as well as a neighbor) leaving our building.

The friend who called this to my attention notes, "Looks like it was taken in April...the facade on the new vitamin shop hasn't changed yet" (he strolled down the street to check).

It's a weird feeling, all right. But blurry enough so I don't feel violated or anything.

My friend added: "Now, of course, you need to Google your place of employ to see if they have you walking out of that."

I don't think I want to.

Posted by Rebecca Lieb at 12:39 PM | Permalink

June 12, 2007

Google Defends Data-Retention Practices

In response to an E.U. Article 29 Working Party investigation, Google has changed its data retention policies again. Instead of the 18-24 months that it announced in March as the cut-off for keeping server logs, Google will now anonymize its search server logs after 18 months, according to a post on the Google Blog by Peter Fleischer, Google's global privacy counsel.

The Working Party raised concerns over the length of time Google kept server data, as well as the length of time it set its cookies to expire. It also questioned the need to keep data or use cookies at all. Fleischer defends Google's policies, while making concessions with the length of time server data is kept and promising to reconsider Google's expiration dates on its cookies.

Danny Sullivan has a complete run-down of the convoluted saga at Search Engine Land.

Posted by Kevin Newcomb at 11:24 AM | Permalink

June 11, 2007

Google Privacy Practices Under Attack

Privacy International, a London-based privacy watchdog group, has issued a report citing Google's privacy practices as the worst among large online destinations.

None of the sites reviewed scored a "privacy friendly" ranking. Several were labeled "privacy aware" but needing improvements or generally aware but with "notable lapses." Sites like AOL, Facebook, Yahoo, and Microsoft Windows Live Spaces were labeled a "substantial threat." But Google was the worst offender of the bunch, according to the report, getting the only "hostile to privacy" label of the group.

The report didn't center on Google, but called out several players for their records on privacy:

While there may be a temptation to focus criticism on Google's privacy performance, it is important to note that not one of the ranked organizations achieved a "green" status. Overall, the privacy standard of the key Internet players is appalling, with some companies demonstrating either willful or a mindless disregard for the privacy rights of their customers. Even the better performing companies create lapses of privacy that are avoidable. With minimal effort most organizations can improve their privacy performance by at least one grade.

Privacy International spoke with AP, and followed the news coverage with an Open Letter to Google criticizing some of Google's responses to the media.

Yesterday, Danny Sullivan takes Privacy International to task at Search Engine Land in "Google Bad On Privacy? Maybe It's Privacy International's Report That Sucks." Sullivan criticizes the lack of firsthand information used in the report, and points to several examples where Google seems to have been judged more harshly than other companies in the study for similar track records.

Google engineer Matt Cutts weighs in today with "Why I disagree with Privacy International."

Posted by Kevin Newcomb at 12:20 PM | Permalink

March 15, 2007

Google to Anonymize Server Logs

In an effort to put users at ease and eliminate some privacy concerns, Google will begin anonymizing server log data after 18-24 months.

"By anonymizing our server logs after 18-24 months, we think we’re striking the right balance between two goals: continuing to improve Google’s services for you, while providing more transparency and certainty about our retention practices," writes Nicole Wong, Google's deputy general counsel.

The data Google cannot be used to track information back to an individual user. Danny Sullivan runs down the details at Search Engine Land.

Posted by Kevin Newcomb at 8:43 AM | Permalink

November 8, 2006

Eric Schmidt At Web 2.0 On YouTube & Other Issues

John Battelle spoke with Eric Schmidt at Web 2.0 yesterday. What have we got? YouTube's growth made it a necessary purchase. No, money's not set aside to cover YouTube legal claims. Yes, you can have your date if you want it, users. No, Google's not trying to take out Microsoft Office. Plus some more below.

Google CEO Eric Schmidt: We would never trap user data from ZDNet has coverage that has Schmidt saying:

  • Google bought YouTube because it was growing faster than Google Video, and video was a "fundamental data type" to Google.
  • Google's still figuring out ways to compensate content owners with video, a complex area.
  • Google would support exporting personal data (search history, email, etc) to other providers, if it can be authenticated.
  • Google's office products are "casual" and not aimed at Microsoft.

Google CEO denies rumor of YouTube legal reserve from Reuters quotes Schmidt as saying "not true" to a rumor that $500 million of the YouTube sales prices was set aside for legal claims.

@ Web 2.0: Day One Highlights: Ad 2.0; Google CEO; Skype Content from PaidContent covers Schmidt but also touches on IAC's Barry Diller saying in a separate interview that he doesn't expect Google will become a media monopoly or dominant player.

Web 2.0 Con: Liveblogging the "Conversation with Eric Schmidt" from Valleywag has a nice minute-by-minute rundown of the interview, for those that want more -- and covers that if Schmidt or one of the cofounders Larry Page or Sergey Brin don't agree on something, the cofounder wins. "I'm the one with the experience who's late. Left to their own devices they'd be early and right, but too early."

Posted by Danny Sullivan at 5:34 AM | Permalink

October 2, 2006

Reading Other People's Gmail Via Bloglines

Using Bloglines to snoop on people's private Gmail from Martin Belam looks at how he accidentally stumbled upon email feeds that individuals are posting to Bloglines. To be fair, it's an issue that could happen to any "private" feed that someone unknowingly shares to the public.

Gmail allows people to get a feed of their email, as covered in these help pages. That lets you see the subject of your emails along with short descriptions. But even this small amount of information might be too embarrassing for some people to have made public.

How would those summaries get made public at all? In the case Martin looks at, people are adding their Gmail feeds to Bloglines but leaving those feeds public for others to view. That's how he stumbled upon them.

Google does warn about this, but he thinks the warning could be more visible. Perhaps -- but it's also worth keeping in mind that using an online news reader means you need to carefully consider ANY feed you take and whether those settings are public or not.

Postscript From Bloglines:

Bloglines is committed to online privacy and we take our role in this effort seriously. I'd like to help correct some of the misconceptions and explain how Bloglines privacy works in regards to both search and feeds as well as how to use Bloglines properly to generate secure feeds.

The main issue at hand is the appearance of Gmail accounts in Bloglines and a users's ability to subscribe to these feeds (or search for posts from these feeds).

The examples displayed were actually Gmail accounts registered through a third party (Feedburner) and then subscribed to within Bloglines.

Bloglines actually provides HTTP authentication for secure feeds. When this method is used, Bloglines secures the feed so that it can not be searched on or subscribed to except by the owner of the feed.

However, when the user generates their feed through a third party like Feedburner, the authentication portion has been removed from Bloglines' control and we have no way to identify and secure the feed. As a result the feed and it's previously secure data become public. Clearly this is a problem and we are in contact with Feedburner and other third parties to help them better inform and protect their users.

The other issue is the definition and understanding of "private" feeds within Bloglines. Marking a feed as private in Bloglines only hides the feed from your public blogroll and your identity from the feed's list of subscribers. We try to make this clear to Bloglines users by prominently displaying the following note during the feed subscription process:

"Private subscriptions don't show up in blogrolls and you will not be listed as a public subscriber. However, the feed and all its posts will remain available to the public via Bloglines and Ask.com Blog & Feed Search. Exceptions are Bloglines email subscriptions and feeds that require http authentication. In both cases, the feed and its posts will not be included in search results."

This issue has reminded us that there is still some confusion about privacy in the world of feeds. We recognize that a better system of limiting access to feeds is needed as more content becomes syndicated or syndicatable. We have been leading the effort to build new safeguards into syndications standards and are hopeful that some type of Feed Access Standard will provide further security for users and their feeds.

Posted by Danny Sullivan at 8:36 AM | Permalink

June 22, 2006

Google Updates Toolbar Privacy Policy

It appears to me that Google updated the Google Toolbar Privacy Policy yesterday. I know the dates do not reflect that on the page, but if you take a look at the current version and compare it to the cached version from Jun 16, 2006 you will notice a lot of changes. Below are some of the larger changes to the privacy policy.

+ Removed a bullet that read;

We do not associate any of the information that Toolbar sends with other personal information about you. However, it is possible that a URL or other page information sent to Google may itself contain personal information. For information about how some web sites embed personal information in web requests, click here.

+ Added/Changed Significantly the following bullets;

(1) Toolbar Features that give you access to other Google services such as Blogger and Gmail are subject to the separate Privacy Policies of those products. Features that require use of a Google Account, like Bookmarks, store information with your Account as explained in the main Google Privacy Policy. Other features, like SMS This, that let you transmit data from the Toolbar may log that data transmission, as explained in the FAQ. (2) Third party site custom buttons send information such as search queries to sites that are not operated by Google or covered by Google's Privacy Policy. (3) If you have Google Toolbar Version 4.0 or above, your copy of Google Toolbar includes a unique application number. When you install Google Toolbar, this number and a message indicating whether the installation succeeded are sent back to Google. Also, when Google Toolbar automatically checks to see if a new version is available, the current version number and the unique application number are sent to Google. The unique application number is required for Google Toolbar to work and cannot be disabled. (4) Except for information sent through Toolbar for use with a separate Account-based service such as Gmail, we do not associate any of the information that Toolbar sends with other personal information about you. However, it is possible that a URL or other page information sent to Google may itself contain personal information. For information about how this may happen, click here.

Those are the changes I noticed.

Posted by Barry Schwartz at 9:03 AM | Permalink

Google Updates Toolbar Privacy Policy

It appears to me that Google updated the Google Toolbar Privacy Policy yesterday. I know the dates do not reflect that on the page, but if you take a look at the current version and compare it to the cached version from Jun 16, 2006 you will notice a lot of changes. Below are some of the larger changes to the privacy policy.

+ Removed a bullet that read;

We do not associate any of the information that Toolbar sends with other personal information about you. However, it is possible that a URL or other page information sent to Google may itself contain personal information. For information about how some web sites embed personal information in web requests, click here.

+ Added/Changed Significantly the following bullets;

(1) Toolbar Features that give you access to other Google services such as Blogger and Gmail are subject to the separate Privacy Policies of those products. Features that require use of a Google Account, like Bookmarks, store information with your Account as explained in the main Google Privacy Policy. Other features, like SMS This, that let you transmit data from the Toolbar may log that data transmission, as explained in the FAQ. (2) Third party site custom buttons send information such as search queries to sites that are not operated by Google or covered by Google's Privacy Policy. (3) If you have Google Toolbar Version 4.0 or above, your copy of Google Toolbar includes a unique application number. When you install Google Toolbar, this number and a message indicating whether the installation succeeded are sent back to Google. Also, when Google Toolbar automatically checks to see if a new version is available, the current version number and the unique application number are sent to Google. The unique application number is required for Google Toolbar to work and cannot be disabled. (4) Except for information sent through Toolbar for use with a separate Account-based service such as Gmail, we do not associate any of the information that Toolbar sends with other personal information about you. However, it is possible that a URL or other page information sent to Google may itself contain personal information. For information about how this may happen, click here.

Those are the changes I noticed.

Posted by Kevin Heisler at 9:03 AM | Permalink

Google Updates Toolbar Privacy Policy

It appears to me that Google updated the Google Toolbar Privacy Policy yesterday. I know the dates do not reflect that on the page, but if you take a look at the current version and compare it to the cached version from Jun 16, 2006 you will notice a lot of changes. Below are some of the larger changes to the privacy policy.

+ Removed a bullet that read;

We do not associate any of the information that Toolbar sends with other personal information about you. However, it is possible that a URL or other page information sent to Google may itself contain personal information. For information about how some web sites embed personal information in web requests, click here.

+ Added/Changed Significantly the following bullets;

(1) Toolbar Features that give you access to other Google services such as Blogger and Gmail are subject to the separate Privacy Policies of those products. Features that require use of a Google Account, like Bookmarks, store information with your Account as explained in the main Google Privacy Policy. Other features, like SMS This, that let you transmit data from the Toolbar may log that data transmission, as explained in the FAQ. (2) Third party site custom buttons send information such as search queries to sites that are not operated by Google or covered by Google's Privacy Policy. (3) If you have Google Toolbar Version 4.0 or above, your copy of Google Toolbar includes a unique application number. When you install Google Toolbar, this number and a message indicating whether the installation succeeded are sent back to Google. Also, when Google Toolbar automatically checks to see if a new version is available, the current version number and the unique application number are sent to Google. The unique application number is required for Google Toolbar to work and cannot be disabled. (4) Except for information sent through Toolbar for use with a separate Account-based service such as Gmail, we do not associate any of the information that Toolbar sends with other personal information about you. However, it is possible that a URL or other page information sent to Google may itself contain personal information. For information about how this may happen, click here.

Those are the changes I noticed.

Posted by Kevin Heisler at 9:03 AM | Permalink

Google Updates Toolbar Privacy Policy

It appears to me that Google updated the Google Toolbar Privacy Policy yesterday. I know the dates do not reflect that on the page, but if you take a look at the current version and compare it to the cached version from Jun 16, 2006 you will notice a lot of changes. Below are some of the larger changes to the privacy policy.

+ Removed a bullet that read;

We do not associate any of the information that Toolbar sends with other personal information about you. However, it is possible that a URL or other page information sent to Google may itself contain personal information. For information about how some web sites embed personal information in web requests, click here.

+ Added/Changed Significantly the following bullets;

(1) Toolbar Features that give you access to other Google services such as Blogger and Gmail are subject to the separate Privacy Policies of those products. Features that require use of a Google Account, like Bookmarks, store information with your Account as explained in the main Google Privacy Policy. Other features, like SMS This, that let you transmit data from the Toolbar may log that data transmission, as explained in the FAQ. (2) Third party site custom buttons send information such as search queries to sites that are not operated by Google or covered by Google's Privacy Policy. (3) If you have Google Toolbar Version 4.0 or above, your copy of Google Toolbar includes a unique application number. When you install Google Toolbar, this number and a message indicating whether the installation succeeded are sent back to Google. Also, when Google Toolbar automatically checks to see if a new version is available, the current version number and the unique application number are sent to Google. The unique application number is required for Google Toolbar to work and cannot be disabled. (4) Except for information sent through Toolbar for use with a separate Account-based service such as Gmail, we do not associate any of the information that Toolbar sends with other personal information about you. However, it is possible that a URL or other page information sent to Google may itself contain personal information. For information about how this may happen, click here.

Those are the changes I noticed.

Posted by Kevin Heisler at 9:03 AM | Permalink

March 8, 2006

Google Filings Against DOJ Request -- Including Declaration From Matt Cutts

I'm planning a deeper look at Google's rejection of the Department Of Justice search records request, which happened last week when I was on vacation. But a quick head's up. Many of you may have seen Google's blog post on the subject here, which in turn leads to their formal filing here (PDF). But that wasn't the only filing. Catching up on my feeds this morning, I saw that Gary compiled a full list of Google filings over here (PDF). My eyebrows shot-up when I saw Google's Matt Cutts had a long declaration as part of that package. I was planning to help spread the word more about this as part of an overall summary of what's in the various summaries, but Matt himself beat me to it with this blog post. So happy reading! I'll still be working on that general summary of everything hopefully for later this week.

NOTE: This was originally written on Feb. 22, but I've only just seen that it was left as a "draft" and never published. Sorry about that!

Posted by Danny Sullivan at 2:55 PM | Permalink

Google Filings Against DOJ Request -- Including Declaration From Matt Cutts

I'm planning a deeper look at Google's rejection of the Department Of Justice search records request, which happened last week when I was on vacation. But a quick head's up. Many of you may have seen Google's blog post on the subject here, which in turn leads to their formal filing here (PDF). But that wasn't the only filing. Catching up on my feeds this morning, I saw that Gary compiled a full list of Google filings over here (PDF). My eyebrows shot-up when I saw Google's Matt Cutts had a long declaration as part of that package. I was planning to help spread the word more about this as part of an overall summary of what's in the various summaries, but Matt himself beat me to it with this blog post. So happy reading! I'll still be working on that general summary of everything hopefully for later this week.

NOTE: This was originally written on Feb. 22, but I've only just seen that it was left as a "draft" and never published. Sorry about that!

Posted by Kevin Heisler at 2:55 PM | Permalink

Google Filings Against DOJ Request -- Including Declaration From Matt Cutts

I'm planning a deeper look at Google's rejection of the Department Of Justice search records request, which happened last week when I was on vacation. But a quick head's up. Many of you may have seen Google's blog post on the subject here, which in turn leads to their formal filing here (PDF). But that wasn't the only filing. Catching up on my feeds this morning, I saw that Gary compiled a full list of Google filings over here (PDF). My eyebrows shot-up when I saw Google's Matt Cutts had a long declaration as part of that package. I was planning to help spread the word more about this as part of an overall summary of what's in the various summaries, but Matt himself beat me to it with this blog post. So happy reading! I'll still be working on that general summary of everything hopefully for later this week.

NOTE: This was originally written on Feb. 22, but I've only just seen that it was left as a "draft" and never published. Sorry about that!

Posted by Kevin Heisler at 2:55 PM | Permalink

Google Filings Against DOJ Request -- Including Declaration From Matt Cutts

I'm planning a deeper look at Google's rejection of the Department Of Justice search records request, which happened last week when I was on vacation. But a quick head's up. Many of you may have seen Google's blog post on the subject here, which in turn leads to their formal filing here (PDF). But that wasn't the only filing. Catching up on my feeds this morning, I saw that Gary compiled a full list of Google filings over here (PDF). My eyebrows shot-up when I saw Google's Matt Cutts had a long declaration as part of that package. I was planning to help spread the word more about this as part of an overall summary of what's in the various summaries, but Matt himself beat me to it with this blog post. So happy reading! I'll still be working on that general summary of everything hopefully for later this week.

NOTE: This was originally written on Feb. 22, but I've only just seen that it was left as a "draft" and never published. Sorry about that!

Posted by Kevin Heisler at 2:55 PM | Permalink

February 8, 2006

Google Introduces Marked Up Version Of Privacy Policy Changes

Google Brilliantly Updates Privacy Policy from Nathan at InsideGoogle notes that for Google Talk's privacy policy, you can now view a previous version where changes are highlighted. Nice. Other privacy policies at Google don't seem to have this yet. I'm guessing this will happen as each of them (such as toolbar or Gmail) are updated going forward. Most that I looked at were changed as part of a big privacy update Google did last October. Still, the Google personalized home page policy is dated as of January 2006, so it probably has changed since the October wave but has no guide to past versions. Prior versions of the general privacy policy can be found here.

Posted by Danny Sullivan at 12:34 PM | Permalink

Google Introduces Marked Up Version Of Privacy Policy Changes

Google Brilliantly Updates Privacy Policy from Nathan at InsideGoogle notes that for Google Talk's privacy policy, you can now view a previous version where changes are highlighted. Nice. Other privacy policies at Google don't seem to have this yet. I'm guessing this will happen as each of them (such as toolbar or Gmail) are updated going forward. Most that I looked at were changed as part of a big privacy update Google did last October. Still, the Google personalized home page policy is dated as of January 2006, so it probably has changed since the October wave but has no guide to past versions. Prior versions of the general privacy policy can be found here.

Posted by Kevin Heisler at 12:34 PM | Permalink

Google Introduces Marked Up Version Of Privacy Policy Changes

Google Brilliantly Updates Privacy Policy from Nathan at InsideGoogle notes that for Google Talk's privacy policy, you can now view a previous version where changes are highlighted. Nice. Other privacy policies at Google don't seem to have this yet. I'm guessing this will happen as each of them (such as toolbar or Gmail) are updated going forward. Most that I looked at were changed as part of a big privacy update Google did last October. Still, the Google personalized home page policy is dated as of January 2006, so it probably has changed since the October wave but has no guide to past versions. Prior versions of the general privacy policy can be found here.

Posted by Kevin Heisler at 12:34 PM | Permalink

Google Introduces Marked Up Version Of Privacy Policy Changes

Google Brilliantly Updates Privacy Policy from Nathan at InsideGoogle notes that for Google Talk's privacy policy, you can now view a previous version where changes are highlighted. Nice. Other privacy policies at Google don't seem to have this yet. I'm guessing this will happen as each of them (such as toolbar or Gmail) are updated going forward. Most that I looked at were changed as part of a big privacy update Google did last October. Still, the Google personalized home page policy is dated as of January 2006, so it probably has changed since the October wave but has no guide to past versions. Prior versions of the general privacy policy can be found here.

Posted by Kevin Heisler at 12:34 PM | Permalink

January 24, 2006

Google Not Installing Third Party Cookies -- It's Firefox Prefetching

John Battelle spotted a post from Chris Marino at Tumbling Duke that has the worrisome suggestion that Google is allowing third parties to set cookies based on searches people do. But I dropped an IM to Dave Naylor, who immediately spotted this being due to Firefox prefetching.

If you use Firefox, Google will automatically preload the pages showing in the top search results. They made this change back in March. As they warned back then:

With prefetching enabled, you may end up with cookies and web pages in your web browser's cache from web sites that you did not click on since prefetching happens automatically when you view Google search results pages. You can delete these files by clearing your browser's cache and cookies.

So in Chris's case, he writes about how he searched for cars, Amazon and Walmart and got cookies from Cars.com, Amazon.com and Walmart. He assumed this is all related to AdWords in some way.

AdWords isn't the issue. It's because for a search on cars, Cars.com was the first site listed and so that page was preloaded -- and that meant a cookie from Cars.com came with it. It's the same situation was true for Amazon and Walmart. in searches on their names.

Posted by Danny Sullivan at 12:40 PM | Permalink

Google Not Installing Third Party Cookies -- It's Firefox Prefetching

John Battelle spotted a post from Chris Marino at Tumbling Duke that has the worrisome suggestion that Google is allowing third parties to set cookies based on searches people do. But I dropped an IM to Dave Naylor, who immediately spotted this being due to Firefox prefetching.

If you use Firefox, Google will automatically preload the pages showing in the top search results. They made this change back in March. As they warned back then:

With prefetching enabled, you may end up with cookies and web pages in your web browser's cache from web sites that you did not click on since prefetching happens automatically when you view Google search results pages. You can delete these files by clearing your browser's cache and cookies.

So in Chris's case, he writes about how he searched for cars, Amazon and Walmart and got cookies from Cars.com, Amazon.com and Walmart. He assumed this is all related to AdWords in some way.

AdWords isn't the issue. It's because for a search on cars, Cars.com was the first site listed and so that page was preloaded -- and that meant a cookie from Cars.com came with it. It's the same situation was true for Amazon and Walmart. in searches on their names.

Posted by Kevin Heisler at 12:40 PM | Permalink

Google Not Installing Third Party Cookies -- It's Firefox Prefetching

John Battelle spotted a post from Chris Marino at Tumbling Duke that has the worrisome suggestion that Google is allowing third parties to set cookies based on searches people do. But I dropped an IM to Dave Naylor, who immediately spotted this being due to Firefox prefetching.

If you use Firefox, Google will automatically preload the pages showing in the top search results. They made this change back in March. As they warned back then:

With prefetching enabled, you may end up with cookies and web pages in your web browser's cache from web sites that you did not click on since prefetching happens automatically when you view Google search results pages. You can delete these files by clearing your browser's cache and cookies.

So in Chris's case, he writes about how he searched for cars, Amazon and Walmart and got cookies from Cars.com, Amazon.com and Walmart. He assumed this is all related to AdWords in some way.

AdWords isn't the issue. It's because for a search on cars, Cars.com was the first site listed and so that page was preloaded -- and that meant a cookie from Cars.com came with it. It's the same situation was true for Amazon and Walmart. in searches on their names.

Posted by Kevin Heisler at 12:40 PM | Permalink

Google Not Installing Third Party Cookies -- It's Firefox Prefetching

John Battelle spotted a post from Chris Marino at Tumbling Duke that has the worrisome suggestion that Google is allowing third parties to set cookies based on searches people do. But I dropped an IM to Dave Naylor, who immediately spotted this being due to Firefox prefetching.

If you use Firefox, Google will automatically preload the pages showing in the top search results. They made this change back in March. As they warned back then:

With prefetching enabled, you may end up with cookies and web pages in your web browser's cache from web sites that you did not click on since prefetching happens automatically when you view Google search results pages. You can delete these files by clearing your browser's cache and cookies.

So in Chris's case, he writes about how he searched for cars, Amazon and Walmart and got cookies from Cars.com, Amazon.com and Walmart. He assumed this is all related to AdWords in some way.

AdWords isn't the issue. It's because for a search on cars, Cars.com was the first site listed and so that page was preloaded -- and that meant a cookie from Cars.com came with it. It's the same situation was true for Amazon and Walmart. in searches on their names.

Posted by Kevin Heisler at 12:40 PM | Permalink

January 19, 2006

Court Documents & Summary Of United States Versus Google Over Search Data

Earlier we reported in Bush Administration Demands Search Data; Google Says No, Yahoo & MSN Said Yes that the US Government seeks to force Google to hand over search data. That story explains more about the situation, and there have been a number of postscripts from when it was first written. Along with that, we've been able to obtain copies of the three court documents filed in the case. Below you'll find links to each document, along with a summary of what's in each of them.

Alberto Gonzalez, as Attorney General of the United States vs. Google Notice of Motion to Compel Compliance (PDF File)

Two quick points. Remember, that this brief was filed by the Government and does not offer a response to their claims. I'm sure that will be coming. Second, I'm not an attorney and haven't played one on tv. My purpose was to summarize what was presented in the document.
  • The motions requests that Google comply with a subpoena filed by the Attorney General and "produce" for inspection and copying the materials the Government is asking for.  
  • After the lead government attorney conferred with Google, Google has chosen not to comply with subpoena.  
  • Google is asking the court to make Google comply  
  • The filing then goes into a background explanation about the Children's Online Protection Act (COPA) and how the government is developing its defense of the constitutionality of COPA. They believe that COPA is, "more effective than filtering software in protecting from harmful exposure to harmful material on the Internet."  
  • In preparation of the case, subpoenas were issued to Google and "other entities" that operate search engines to produce two sets materials.  
  • First, the subpoena asks Google to produce an electronic file contain, "[a]ll URL's that are available to be located on your companys' search engine as of July 31, 2005.  
  • However, after "lengthy negotiation" the government changed and "narrowed" their request and asked for a "multi- stage random sample of one million URLS from Google's database ie, a random selection of the various databases in which those URL's are stored, and a random sample of the URL's held in those selected databases.  
  • Second, Google was asked to "produce an electronic file containing [a]ll queries entered into the Google engine between July 1 and July 31 inclusive.  
  • Again, after lengthy negotiations the government the government changed their request and asked for an electronic file "containing the text of any search string entered into Google's search engine for a one week period (absent any personal information identifying the person who entered the query).  
  • Google has still refused to comply with these requests in any way.  
  • The Government says that access to this information would be of "significant significance" in the preoperation of the their case.
  • Specifically why?  
  • "The production set of queries entered into Google's search engine would assist the Government in its efforts to understand the behavior of current web users, to estimate how often web users encounter harmful-to-minors material in the course of their searches, and to measure the effectiveness of filtering in screening that material."  
  • This information would also help the Government understand what, "web sites people find through the use of search engines, to determine the character of those sites, to estimate the prevalence of harmful-to-minors material on those sites, and to measure the effectiveness of filtering software on that harmful to minors material.  
  • The document continues into a discussion with plenty of legalese and citations and again points out the Google has failed to comply and lists some of the reason Google objects to this.  
  • Google first objects to this on the grounds of relevancy.  
  • Google also objects on the grounds that if they would provide what the government asks for, they would be required to produce information identifying the users of its search engines.  
  • The Government claims that this is "illusory" since they have specifically asked for a random sample containing no personally identifying information to any search string.  
  • The Government said that it has received compliance from search entities with files containing no personally identifying information.  
  • Google also contends that the information they're being asked to produce is "redundant" since the Government has asked other engines to produce similar files. The Government argues that this "misunderstands" what's being requested. "The production set of queries from Google's database, in combination with similar productions from other search engine operators will assist the Government in developing a sample of the overall universe of search engines queries, while accounting for the potential of any variations in the type of queries that are entered into different search engines."  
  • The Government says that since Google is the market leader, its response, "would be of value" in developing the Governments overall sample of queries.  
  • Google says that complying would also force Google to share trade secrets because the total number of queries receives in a day is a trade secret. The Government adds that if this was the case, a district court has said that these numbers would not be disclosed.  
  • Finally, according to the filing, Google says that it will be subject to an "undue burden" in complying. The Government claims that this is not the case whatsoever. The Government adds that they would be "willing to work" with Google to specify a multistage sample. They are also willing to compensate Google for its work and complying with the subpoena.  
  • The filing ends with the Government saying that, "This court should require Google to comply with the subpoena on the same terms it's competitors have."

Declaration Of Joel McElvain (PDF File)

    The second filing is a declaration by Government attorney, Joel McElvain, who I believe the lead attorney for the U.S. Department of Justice in this matter. It also helps produce a timeline of events to this point. It includes:
  • A copy of the original subpoena, originally signed on August 25, 2005
  • Detailed info and definitions about Google was to submit to the Government.
  • A several page letter, dated October 25, 2005, from Ashok Ramani, Commercial Litigation Counsel, Google sent to Joel McElvain with his objection to the subpoena. THIS IS A MUST READ!!!
  • Key Quotes and Passages from the Letter

  • "It is against Google's competitive interest to be viewed as reflecting the whole world wide web."
  • Worth noting that Google says that the government tried to use Archive.org/Wayback Machine and found the results unsatisfactory. From the letter, "...given the www.archive.org's stated purpose, one would expect them -- with an appropriate consulting relationship to create the results the DEFENDANT wanted.
  • The Governments request is seen as redundant because they already has URLs from at least one other engine
  • From the letter, "Though the search engines doubtlessly have some differences in the URLS, they store, what distinguishes Google from it's competitors is the sophistication of Google's search engine in locating and ordering relevant results."
  • On the burden to Google. "Google would have to spend a disproportionate amount of engineering time and resources to (i) number (even in rough terms) in real time the URLs contained in its search database and (ii) extract based on that initial numbering the URLs selected by Professor Stark.
  • Google also objects because it could "endanger" its "crown-jewel trade secrets." Specficially, they would have to disclose the approximate number of URLs in its database and "some" details on how it crawls URLs, "such as the number of servers, server distribution, and how often Google crawls the World Wide Web."
  • More objections. "Google objects to the Defendant's view of Google's highly proprietary queries database as a free resource that Defendant can use, some levels removed, to formulate its own defense."
  • "Moreover, Google's acceeding to the Request would suggest that it is willing to reveal information about those who use its services. This is not a perception Google is willing to accept. And one can envision scenarios where queries alone could reveal identifying information about a specific Google user, which is another outcome we cannot accept.
  • Next, we find another letter. This time it's from DOJ's McElvain to Google's Ramani. This later is dated December 23, 2005.
  • The letter discusses how the Government is willing to narrow what's asked for in the subpeona
  • This is summarized in the Alberto Gonzalez, as Attorney General of the United States vs. Google section of this post.
  • McElvain discusses how Google asked for and was granted two extensions to serve their objections to the subpeona until October 10, 2005. He then writes, "In our several discussions prior to the service of those objections we had offered to limit the scope of of the requests for production, and you had indicated Google's willingness to consider compliance with the subpeona along with the narrowed terms that we had suggested. Your written objection also reiterated your hope to reach a resolution regarding Google's compliance with the subpeona. However, shortly after the service of your objections, you telephoned me to inform me that Google would decline to comply with the subpeona.
  • More conversations between the Government and Google take place on December 12th and December 21st to discuss the technical aspects of the request. Finally, on December 21st, MacElvain was informed that Google would not comply with the subpeona.
  • The final document is a protective order in the ACLU v. U.S. case.

Declaration Of Philip B Stark (PDF File)

This document is a declaration by Philipp Stark, Ph.D who was the person to work on the project. Dr. Stark is a Professor of Statistics at the University of California, Berkeley.
  • Stark explains how he has had conversations with the USDOJ, Google and other search providers, "to develop practical approaches to sampling their databases or URLs and search queries."
  • He adds that he has started to analyze the samples produced by search providers other than Google.
  • He writes, "Reviewing user queries to search engines will help us understand the search behavior of current web users, to estimate how often web users encounter HTM materials through searches, and to measure the effectiveness of filters in screening those materials.
Stark goes on to add more about his approach while including Google results are directly relevant.

Posted by Gary Price at 4:18 PM | Permalink

Court Documents & Summary Of United States Versus Google Over Search Data

Earlier we reported in Bush Administration Demands Search Data; Google Says No, Yahoo & MSN Said Yes that the US Government seeks to force Google to hand over search data. That story explains more about the situation, and there have been a number of postscripts from when it was first written. Along with that, we've been able to obtain copies of the three court documents filed in the case. Below you'll find links to each document, along with a summary of what's in each of them.

Alberto Gonzalez, as Attorney General of the United States vs. Google Notice of Motion to Compel Compliance (PDF File)

Two quick points. Remember, that this brief was filed by the Government and does not offer a response to their claims. I'm sure that will be coming. Second, I'm not an attorney and haven't played one on tv. My purpose was to summarize what was presented in the document.
  • The motions requests that Google comply with a subpoena filed by the Attorney General and "produce" for inspection and copying the materials the Government is asking for.  
  • After the lead government attorney conferred with Google, Google has chosen not to comply with subpoena.  
  • Google is asking the court to make Google comply  
  • The filing then goes into a background explanation about the Children's Online Protection Act (COPA) and how the government is developing its defense of the constitutionality of COPA. They believe that COPA is, "more effective than filtering software in protecting from harmful exposure to harmful material on the Internet."  
  • In preparation of the case, subpoenas were issued to Google and "other entities" that operate search engines to produce two sets materials.  
  • First, the subpoena asks Google to produce an electronic file contain, "[a]ll URL's that are available to be located on your companys' search engine as of July 31, 2005.  
  • However, after "lengthy negotiation" the government changed and "narrowed" their request and asked for a "multi- stage random sample of one million URLS from Google's database ie, a random selection of the various databases in which those URL's are stored, and a random sample of the URL's held in those selected databases.  
  • Second, Google was asked to "produce an electronic file containing [a]ll queries entered into the Google engine between July 1 and July 31 inclusive.  
  • Again, after lengthy negotiations the government the government changed their request and asked for an electronic file "containing the text of any search string entered into Google's search engine for a one week period (absent any personal information identifying the person who entered the query).  
  • Google has still refused to comply with these requests in any way.  
  • The Government says that access to this information would be of "significant significance" in the preoperation of the their case.
  • Specifically why?  
  • "The production set of queries entered into Google's search engine would assist the Government in its efforts to understand the behavior of current web users, to estimate how often web users encounter harmful-to-minors material in the course of their searches, and to measure the effectiveness of filtering in screening that material."  
  • This information would also help the Government understand what, "web sites people find through the use of search engines, to determine the character of those sites, to estimate the prevalence of harmful-to-minors material on those sites, and to measure the effectiveness of filtering software on that harmful to minors material.  
  • The document continues into a discussion with plenty of legalese and citations and again points out the Google has failed to comply and lists some of the reason Google objects to this.  
  • Google first objects to this on the grounds of relevancy.  
  • Google also objects on the grounds that if they would provide what the government asks for, they would be required to produce information identifying the users of its search engines.  
  • The Government claims that this is "illusory" since they have specifically asked for a random sample containing no personally identifying information to any search string.  
  • The Government said that it has received compliance from search entities with files containing no personally identifying information.  
  • Google also contends that the information they're being asked to produce is "redundant" since the Government has asked other engines to produce similar files. The Government argues that this "misunderstands" what's being requested. "The production set of queries from Google's database, in combination with similar productions from other search engine operators will assist the Government in developing a sample of the overall universe of search engines queries, while accounting for the potential of any variations in the type of queries that are entered into different search engines."  
  • The Government says that since Google is the market leader, its response, "would be of value" in developing the Governments overall sample of queries.  
  • Google says that complying would also force Google to share trade secrets because the total number of queries receives in a day is a trade secret. The Government adds that if this was the case, a district court has said that these numbers would not be disclosed.  
  • Finally, according to the filing, Google says that it will be subject to an "undue burden" in complying. The Government claims that this is not the case whatsoever. The Government adds that they would be "willing to work" with Google to specify a multistage sample. They are also willing to compensate Google for its work and complying with the subpoena.  
  • The filing ends with the Government saying that, "This court should require Google to comply with the subpoena on the same terms it's competitors have."

Declaration Of Joel McElvain (PDF File)

    The second filing is a declaration by Government attorney, Joel McElvain, who I believe the lead attorney for the U.S. Department of Justice in this matter. It also helps produce a timeline of events to this point. It includes:
  • A copy of the original subpoena, originally signed on August 25, 2005
  • Detailed info and definitions about Google was to submit to the Government.
  • A several page letter, dated October 25, 2005, from Ashok Ramani, Commercial Litigation Counsel, Google sent to Joel McElvain with his objection to the subpoena. THIS IS A MUST READ!!!
  • Key Quotes and Passages from the Letter

  • "It is against Google's competitive interest to be viewed as reflecting the whole world wide web."
  • Worth noting that Google says that the government tried to use Archive.org/Wayback Machine and found the results unsatisfactory. From the letter, "...given the www.archive.org's stated purpose, one would expect them -- with an appropriate consulting relationship to create the results the DEFENDANT wanted.
  • The Governments request is seen as redundant because they already has URLs from at least one other engine
  • From the letter, "Though the search engines doubtlessly have some differences in the URLS, they store, what distinguishes Google from it's competitors is the sophistication of Google's search engine in locating and ordering relevant results."
  • On the burden to Google. "Google would have to spend a disproportionate amount of engineering time and resources to (i) number (even in rough terms) in real time the URLs contained in its search database and (ii) extract based on that initial numbering the URLs selected by Professor Stark.
  • Google also objects because it could "endanger" its "crown-jewel trade secrets." Specficially, they would have to disclose the approximate number of URLs in its database and "some" details on how it crawls URLs, "such as the number of servers, server distribution, and how often Google crawls the World Wide Web."
  • More objections. "Google objects to the Defendant's view of Google's highly proprietary queries database as a free resource that Defendant can use, some levels removed, to formulate its own defense."
  • "Moreover, Google's acceeding to the Request would suggest that it is willing to reveal information about those who use its services. This is not a perception Google is willing to accept. And one can envision scenarios where queries alone could reveal identifying information about a specific Google user, which is another outcome we cannot accept.
  • Next, we find another letter. This time it's from DOJ's McElvain to Google's Ramani. This later is dated December 23, 2005.
  • The letter discusses how the Government is willing to narrow what's asked for in the subpeona
  • This is summarized in the Alberto Gonzalez, as Attorney General of the United States vs. Google section of this post.
  • McElvain discusses how Google asked for and was granted two extensions to serve their objections to the subpeona until October 10, 2005. He then writes, "In our several discussions prior to the service of those objections we had offered to limit the scope of of the requests for production, and you had indicated Google's willingness to consider compliance with the subpeona along with the narrowed terms that we had suggested. Your written objection also reiterated your hope to reach a resolution regarding Google's compliance with the subpeona. However, shortly after the service of your objections, you telephoned me to inform me that Google would decline to comply with the subpeona.
  • More conversations between the Government and Google take place on December 12th and December 21st to discuss the technical aspects of the request. Finally, on December 21st, MacElvain was informed that Google would not comply with the subpeona.
  • The final document is a protective order in the ACLU v. U.S. case.

Declaration Of Philip B Stark (PDF File)

This document is a declaration by Philipp Stark, Ph.D who was the person to work on the project. Dr. Stark is a Professor of Statistics at the University of California, Berkeley.
  • Stark explains how he has had conversations with the USDOJ, Google and other search providers, "to develop practical approaches to sampling their databases or URLs and search queries."
  • He adds that he has started to analyze the samples produced by search providers other than Google.
  • He writes, "Reviewing user queries to search engines will help us understand the search behavior of current web users, to estimate how often web users encounter HTM materials through searches, and to measure the effectiveness of filters in screening those materials.
Stark goes on to add more about his approach while including Google results are directly relevant.

Posted by Kevin Heisler at 4:18 PM | Permalink

Court Documents & Summary Of United States Versus Google Over Search Data

Earlier we reported in Bush Administration Demands Search Data; Google Says No, Yahoo & MSN Said Yes that the US Government seeks to force Google to hand over search data. That story explains more about the situation, and there have been a number of postscripts from when it was first written. Along with that, we've been able to obtain copies of the three court documents filed in the case. Below you'll find links to each document, along with a summary of what's in each of them.

Alberto Gonzalez, as Attorney General of the United States vs. Google Notice of Motion to Compel Compliance (PDF File)

Two quick points. Remember, that this brief was filed by the Government and does not offer a response to their claims. I'm sure that will be coming. Second, I'm not an attorney and haven't played one on tv. My purpose was to summarize what was presented in the document.
  • The motions requests that Google comply with a subpoena filed by the Attorney General and "produce" for inspection and copying the materials the Government is asking for.  
  • After the lead government attorney conferred with Google, Google has chosen not to comply with subpoena.  
  • Google is asking the court to make Google comply  
  • The filing then goes into a background explanation about the Children's Online Protection Act (COPA) and how the government is developing its defense of the constitutionality of COPA. They believe that COPA is, "more effective than filtering software in protecting from harmful exposure to harmful material on the Internet."  
  • In preparation of the case, subpoenas were issued to Google and "other entities" that operate search engines to produce two sets materials.  
  • First, the subpoena asks Google to produce an electronic file contain, "[a]ll URL's that are available to be located on your companys' search engine as of July 31, 2005.  
  • However, after "lengthy negotiation" the government changed and "narrowed" their request and asked for a "multi- stage random sample of one million URLS from Google's database ie, a random selection of the various databases in which those URL's are stored, and a random sample of the URL's held in those selected databases.  
  • Second, Google was asked to "produce an electronic file containing [a]ll queries entered into the Google engine between July 1 and July 31 inclusive.  
  • Again, after lengthy negotiations the government the government changed their request and asked for an electronic file "containing the text of any search string entered into Google's search engine for a one week period (absent any personal information identifying the person who entered the query).  
  • Google has still refused to comply with these requests in any way.  
  • The Government says that access to this information would be of "significant significance" in the preoperation of the their case.
  • Specifically why?  
  • "The production set of queries entered into Google's search engine would assist the Government in its efforts to understand the behavior of current web users, to estimate how often web users encounter harmful-to-minors material in the course of their searches, and to measure the effectiveness of filtering in screening that material."  
  • This information would also help the Government understand what, "web sites people find through the use of search engines, to determine the character of those sites, to estimate the prevalence of harmful-to-minors material on those sites, and to measure the effectiveness of filtering software on that harmful to minors material.  
  • The document continues into a discussion with plenty of legalese and citations and again points out the Google has failed to comply and lists some of the reason Google objects to this.  
  • Google first objects to this on the grounds of relevancy.  
  • Google also objects on the grounds that if they would provide what the government asks for, they would be required to produce information identifying the users of its search engines.  
  • The Government claims that this is "illusory" since they have specifically asked for a random sample containing no personally identifying information to any search string.  
  • The Government said that it has received compliance from search entities with files containing no personally identifying information.  
  • Google also contends that the information they're being asked to produce is "redundant" since the Government has asked other engines to produce similar files. The Government argues that this "misunderstands" what's being requested. "The production set of queries from Google's database, in combination with similar productions from other search engine operators will assist the Government in developing a sample of the overall universe of search engines queries, while accounting for the potential of any variations in the type of queries that are entered into different search engines."  
  • The Government says that since Google is the market leader, its response, "would be of value" in developing the Governments overall sample of queries.  
  • Google says that complying would also force Google to share trade secrets because the total number of queries receives in a day is a trade secret. The Government adds that if this was the case, a district court has said that these numbers would not be disclosed.  
  • Finally, according to the filing, Google says that it will be subject to an "undue burden" in complying. The Government claims that this is not the case whatsoever. The Government adds that they would be "willing to work" with Google to specify a multistage sample. They are also willing to compensate Google for its work and complying with the subpoena.  
  • The filing ends with the Government saying that, "This court should require Google to comply with the subpoena on the same terms it's competitors have."

Declaration Of Joel McElvain (PDF File)

    The second filing is a declaration by Government attorney, Joel McElvain, who I believe the lead attorney for the U.S. Department of Justice in this matter. It also helps produce a timeline of events to this point. It includes:
  • A copy of the original subpoena, originally signed on August 25, 2005
  • Detailed info and definitions about Google was to submit to the Government.
  • A several page letter, dated October 25, 2005, from Ashok Ramani, Commercial Litigation Counsel, Google sent to Joel McElvain with his objection to the subpoena. THIS IS A MUST READ!!!
  • Key Quotes and Passages from the Letter

  • "It is against Google's competitive interest to be viewed as reflecting the whole world wide web."
  • Worth noting that Google says that the government tried to use Archive.org/Wayback Machine and found the results unsatisfactory. From the letter, "...given the www.archive.org's stated purpose, one would expect them -- with an appropriate consulting relationship to create the results the DEFENDANT wanted.
  • The Governments request is seen as redundant because they already has URLs from at least one other engine
  • From the letter, "Though the search engines doubtlessly have some differences in the URLS, they store, what distinguishes Google from it's competitors is the sophistication of Google's search engine in locating and ordering relevant results."
  • On the burden to Google. "Google would have to spend a disproportionate amount of engineering time and resources to (i) number (even in rough terms) in real time the URLs contained in its search database and (ii) extract based on that initial numbering the URLs selected by Professor Stark.
  • Google also objects because it could "endanger" its "crown-jewel trade secrets." Specficially, they would have to disclose the approximate number of URLs in its database and "some" details on how it crawls URLs, "such as the number of servers, server distribution, and how often Google crawls the World Wide Web."
  • More objections. "Google objects to the Defendant's view of Google's highly proprietary queries database as a free resource that Defendant can use, some levels removed, to formulate its own defense."
  • "Moreover, Google's acceeding to the Request would suggest that it is willing to reveal information about those who use its services. This is not a perception Google is willing to accept. And one can envision scenarios where queries alone could reveal identifying information about a specific Google user, which is another outcome we cannot accept.
  • Next, we find another letter. This time it's from DOJ's McElvain to Google's Ramani. This later is dated December 23, 2005.
  • The letter discusses how the Government is willing to narrow what's asked for in the subpeona
  • This is summarized in the Alberto Gonzalez, as Attorney General of the United States vs. Google section of this post.
  • McElvain discusses how Google asked for and was granted two extensions to serve their objections to the subpeona until October 10, 2005. He then writes, "In our several discussions prior to the service of those objections we had offered to limit the scope of of the requests for production, and you had indicated Google's willingness to consider compliance with the subpeona along with the narrowed terms that we had suggested. Your written objection also reiterated your hope to reach a resolution regarding Google's compliance with the subpeona. However, shortly after the service of your objections, you telephoned me to inform me that Google would decline to comply with the subpeona.
  • More conversations between the Government and Google take place on December 12th and December 21st to discuss the technical aspects of the request. Finally, on December 21st, MacElvain was informed that Google would not comply with the subpeona.
  • The final document is a protective order in the ACLU v. U.S. case.

Declaration Of Philip B Stark (PDF File)

This document is a declaration by Philipp Stark, Ph.D who was the person to work on the project. Dr. Stark is a Professor of Statistics at the University of California, Berkeley.
  • Stark explains how he has had conversations with the USDOJ, Google and other search providers, "to develop practical approaches to sampling their databases or URLs and search queries."
  • He adds that he has started to analyze the samples produced by search providers other than Google.
  • He writes, "Reviewing user queries to search engines will help us understand the search behavior of current web users, to estimate how often web users encounter HTM materials through searches, and to measure the effectiveness of filters in screening those materials.
Stark goes on to add more about his approach while including Google results are directly relevant.

Posted by Kevin Heisler at 4:18 PM | Permalink

Court Documents & Summary Of United States Versus Google Over Search Data

Earlier we reported in Bush Administration Demands Search Data; Google Says No, Yahoo & MSN Said Yes that the US Government seeks to force Google to hand over search data. That story explains more about the situation, and there have been a number of postscripts from when it was first written. Along with that, we've been able to obtain copies of the three court documents filed in the case. Below you'll find links to each document, along with a summary of what's in each of them.

Alberto Gonzalez, as Attorney General of the United States vs. Google Notice of Motion to Compel Compliance (PDF File)

Two quick points. Remember, that this brief was filed by the Government and does not offer a response to their claims. I'm sure that will be coming. Second, I'm not an attorney and haven't played one on tv. My purpose was to summarize what was presented in the document.
  • The motions requests that Google comply with a subpoena filed by the Attorney General and "produce" for inspection and copying the materials the Government is asking for.  
  • After the lead government attorney conferred with Google, Google has chosen not to comply with subpoena.  
  • Google is asking the court to make Google comply  
  • The filing then goes into a background explanation about the Children's Online Protection Act (COPA) and how the government is developing its defense of the constitutionality of COPA. They believe that COPA is, "more effective than filtering software in protecting from harmful exposure to harmful material on the Internet."  
  • In preparation of the case, subpoenas were issued to Google and "other entities" that operate search engines to produce two sets materials.  
  • First, the subpoena asks Google to produce an electronic file contain, "[a]ll URL's that are available to be located on your companys' search engine as of July 31, 2005.  
  • However, after "lengthy negotiation" the government changed and "narrowed" their request and asked for a "multi- stage random sample of one million URLS from Google's database ie, a random selection of the various databases in which those URL's are stored, and a random sample of the URL's held in those selected databases.  
  • Second, Google was asked to "produce an electronic file containing [a]ll queries entered into the Google engine between July 1 and July 31 inclusive.  
  • Again, after lengthy negotiations the government the government changed their request and asked for an electronic file "containing the text of any search string entered into Google's search engine for a one week period (absent any personal information identifying the person who entered the query).  
  • Google has still refused to comply with these requests in any way.  
  • The Government says that access to this information would be of "significant significance" in the preoperation of the their case.
  • Specifically why?  
  • "The production set of queries entered into Google's search engine would assist the Government in its efforts to understand the behavior of current web users, to estimate how often web users encounter harmful-to-minors material in the course of their searches, and to measure the effectiveness of filtering in screening that material."  
  • This information would also help the Government understand what, "web sites people find through the use of search engines, to determine the character of those sites, to estimate the prevalence of harmful-to-minors material on those sites, and to measure the effectiveness of filtering software on that harmful to minors material.  
  • The document continues into a discussion with plenty of legalese and citations and again points out the Google has failed to comply and lists some of the reason Google objects to this.  
  • Google first objects to this on the grounds of relevancy.  
  • Google also objects on the grounds that if they would provide what the government asks for, they would be required to produce information identifying the users of its search engines.  
  • The Government claims that this is "illusory" since they have specifically asked for a random sample containing no personally identifying information to any search string.  
  • The Government said that it has received compliance from search entities with files containing no personally identifying information.  
  • Google also contends that the information they're being asked to produce is "redundant" since the Government has asked other engines to produce similar files. The Government argues that this "misunderstands" what's being requested. "The production set of queries from Google's database, in combination with similar productions from other search engine operators will assist the Government in developing a sample of the overall universe of search engines queries, while accounting for the potential of any variations in the type of queries that are entered into different search engines."  
  • The Government says that since Google is the market leader, its response, "would be of value" in developing the Governments overall sample of queries.  
  • Google says that complying would also force Google to share trade secrets because the total number of queries receives in a day is a trade secret. The Government adds that if this was the case, a district court has said that these numbers would not be disclosed.  
  • Finally, according to the filing, Google says that it will be subject to an "undue burden" in complying. The Government claims that this is not the case whatsoever. The Government adds that they would be "willing to work" with Google to specify a multistage sample. They are also willing to compensate Google for its work and complying with the subpoena.  
  • The filing ends with the Government saying that, "This court should require Google to comply with the subpoena on the same terms it's competitors have."

Declaration Of Joel McElvain (PDF File)

    The second filing is a declaration by Government attorney, Joel McElvain, who I believe the lead attorney for the U.S. Department of Justice in this matter. It also helps produce a timeline of events to this point. It includes:
  • A copy of the original subpoena, originally signed on August 25, 2005
  • Detailed info and definitions about Google was to submit to the Government.
  • A several page letter, dated October 25, 2005, from Ashok Ramani, Commercial Litigation Counsel, Google sent to Joel McElvain with his objection to the subpoena. THIS IS A MUST READ!!!
  • Key Quotes and Passages from the Letter

  • "It is against Google's competitive interest to be viewed as reflecting the whole world wide web."
  • Worth noting that Google says that the government tried to use Archive.org/Wayback Machine and found the results unsatisfactory. From the letter, "...given the www.archive.org's stated purpose, one would expect them -- with an appropriate consulting relationship to create the results the DEFENDANT wanted.
  • The Governments request is seen as redundant because they already has URLs from at least one other engine
  • From the letter, "Though the search engines doubtlessly have some differences in the URLS, they store, what distinguishes Google from it's competitors is the sophistication of Google's search engine in locating and ordering relevant results."
  • On the burden to Google. "Google would have to spend a disproportionate amount of engineering time and resources to (i) number (even in rough terms) in real time the URLs contained in its search database and (ii) extract based on that initial numbering the URLs selected by Professor Stark.
  • Google also objects because it could "endanger" its "crown-jewel trade secrets." Specficially, they would have to disclose the approximate number of URLs in its database and "some" details on how it crawls URLs, "such as the number of servers, server distribution, and how often Google crawls the World Wide Web."
  • More objections. "Google objects to the Defendant's view of Google's highly proprietary queries database as a free resource that Defendant can use, some levels removed, to formulate its own defense."
  • "Moreover, Google's acceeding to the Request would suggest that it is willing to reveal information about those who use its services. This is not a perception Google is willing to accept. And one can envision scenarios where queries alone could reveal identifying information about a specific Google user, which is another outcome we cannot accept.
  • Next, we find another letter. This time it's from DOJ's McElvain to Google's Ramani. This later is dated December 23, 2005.
  • The letter discusses how the Government is willing to narrow what's asked for in the subpeona
  • This is summarized in the Alberto Gonzalez, as Attorney General of the United States vs. Google section of this post.
  • McElvain discusses how Google asked for and was granted two extensions to serve their objections to the subpeona until October 10, 2005. He then writes, "In our several discussions prior to the service of those objections we had offered to limit the scope of of the requests for production, and you had indicated Google's willingness to consider compliance with the subpeona along with the narrowed terms that we had suggested. Your written objection also reiterated your hope to reach a resolution regarding Google's compliance with the subpeona. However, shortly after the service of your objections, you telephoned me to inform me that Google would decline to comply with the subpeona.
  • More conversations between the Government and Google take place on December 12th and December 21st to discuss the technical aspects of the request. Finally, on December 21st, MacElvain was informed that Google would not comply with the subpeona.
  • The final document is a protective order in the ACLU v. U.S. case.

Declaration Of Philip B Stark (PDF File)

This document is a declaration by Philipp Stark, Ph.D who was the person to work on the project. Dr. Stark is a Professor of Statistics at the University of California, Berkeley.
  • Stark explains how he has had conversations with the USDOJ, Google and other search providers, "to develop practical approaches to sampling their databases or URLs and search queries."
  • He adds that he has started to analyze the samples produced by search providers other than Google.
  • He writes, "Reviewing user queries to search engines will help us understand the search behavior of current web users, to estimate how often web users encounter HTM materials through searches, and to measure the effectiveness of filters in screening those materials.
Stark goes on to add more about his approach while including Google results are directly relevant.

Posted by Kevin Heisler at 4:18 PM | Permalink

Bush Administration Demands Search Data; Google Says No; AOL, MSN & Yahoo Said Yes

NOTE: We're continuing to update this news through postscripts below the original story.

Via John Battelle and Google Morning Silicon Valley, the San Jose Mercury News article "Feds want Google search records" covers the Bush administration demanding last year that Google and other search engines turn over aggregate search information to help revive a child protection law. Google has refused to comply with the subpoena. A motion has been filed this week by US Department Of Justice to force Google to hand over the data.

In particular, the Bush administration wanted one million random web addresses and records of all Google searches for a one week period. The government apparently wants to estimate how much pornography shows up in the searches that children do.

Here's a thought. If you want to measure how much porn is showing up in searches, try searching for it yourself rather than issuing privacy alarm sounding subpoenas. It would certainly be more accurate.

Getting a list of all searches in one week definitely would let US federal government dig deep into the long tail of porn searches. But then again, the sheer amount of data would be overwhelming. Do you know every variation of a term someone might use, that you're going to dig out of the hundreds of millions of searches you'd get? Oh, and be sure you filter out all the automated queries coming in from rank checking tools, while you're add it. They won't skew the data at all, nope.

Moreover, since the data is divorced from user info, you have no idea what searches are being done by children or not. In the end, you've asked for a lot of data that's not really going to help you estimate anything at all.

Far better would be to do some searches that you think children and teens are actually doing, such as by doing a survey of them. Then just go start searching on Google and the other search engines yourselves. See what actually comes up, especially when the filtering protection each service offers is enabled. That would give you plenty of data, plus it would be useful for everyone to have someone rigorously test the filtering systems that are offered. Serving subpoenas to get the data isn't necessary.

It's important to note that from what I read, the requests do not involve user data at all. Shutting off your cookies or purging your personalized search data wouldn't protect you with this request, because the request wasn't going after personal data. To stress again:

  • According to the report, they wanted a list of one million web addresses. Not who went to the web pages and when, just a list of URLs picked randomly.  
  • They wanted searches for one week. I haven't seen the court documents, but I'm guessing Google could have handed over a list of searches that were entirely unassociated with IP addresses, times, cookies and registration information. Nothing suggests that they wanted to know who did the searches in any way.

Having said this, such a move absolutely should breed some paranoia. They didn't ask for data this time, but next time, they might. Of course, it bears reminding that this type of data is easily obtainable from ISPs. So even if the search engines refuse to comply, your own ISP could be giving up your data -- or selling it.

Overall, I say kudos to Google for declaring the request overreaching and refusing to comply. I'm checking with the other major search engines to see if they handed over data.

I've spoken and written a bit about the idea that the search engines need to consider creating a clear "Search Privacy Bill Of Rights," spelling out clearly what protections they'll pledge you'll always have with your data and exactly how it will be used, destroyed and so on. I want to move ahead with more explorations of this -- and perhaps we need a similar one enacted by governments to spell out what they will and will not do with our highly private search data.

Moving Past Google Privacy Fears & Toward An Industry Solution from me last year gives you a lot of background on search privacy issues from over the years. There's an extensive reading list at the bottom.

After I put that out, I also created a thread at our Search Engine Watch Forums, How Should Search Engines Protect Privacy?. Unfortunately, that thread -- while it got lots of discussion -- never generated as many concrete ideas and suggestions about what should go in a Search Privacy Bill Of Rights as I hoped for. So I'm trying again. Got thoughts, comments, suggestions? Please visit our new thread, A Search Privacy Bill Of Rights.

Meanwhile, want to talk about this particular move by the Bush Administration? I have a different thread for that, Bush Administration Demands Search Records.

Postscript 1: I have queries out to AOL, Ask Jeeves, MSN and Yahoo to find out if they provided data. I'll note answers here or in a new post.

Postscript 2: I said above that a more accurate way for the government to assess how often children might encounter porn through search engines would be to conduct their own research. Indeed, they have. Government Report Says MSN Search Adult Filter Most Effective from the SEW Blog back in June covers this report (PDF format) that the US Government Accountability Office did back in June. From what I can see, it measured how often children might encounter porn through image search. To do the assessment, no subpoenas were required. From what I posted in our active Bush Administration Demands Search Records discussion at the Search Engine Watch Forums on today's news:

FYI, back to the idea of child filters on search engines, the US government has tested this, as Government Report Says MSN Search Adult Filter Most Effective covers. Note that to do this, they said:

We performed unfiltered 5-minute searches for six keywords: three keywords known to be associated with pornography and three innocuous terms that juveniles would likely use (a popular teenage singer/actress, a popular cartoon, and a popular movie character).

They managed to do this assessment (the US Government Accounting Office) without issuing a subpoena to anyone. Moreover, it has stats they say they want already produced and ready to go. Page 48 and 67 have details. The caveat is that this seems to have been a test of image search results (Yahoo was 92 percent non porn, MSN 76 percent, Google 64%). But you could do the same thing to measure web search.

Postscript 3: Here's the official Google statement from Nicole Wong, associate general counsel with Google. It's what they already told the San Jose Mercury News and are telling other publications:

Google is not a party to this lawsuit and their demand for information overreaches. We had lengthy discussions with them to try to resolve this, but were not able to and we intend to resist their motion vigorously.

Postscript 4: MSN statement is below. It doesn't really answer the question, which was if they complied with a subpoena to hand over data similar to what Google's being sued over. Since it's not a denial, I'm reading this as a tentative yes, that they got a request and passed the data along. I've asked for clarification. The statement:

MSN works closely with law enforcement officials worldwide to assist them when requested. Microsoft fully complies with the Electronic Communications Privacy Act and United States Law as well as Microsoft's terms of use and privacy policies in working with law enforcement. It is our policy to respond to legal requests in a very responsive and timely manner in full compliance with applicable law. MSN takes the safety of its customers very seriously and is committed to providing a safe experience for consumers. As stated in MSN’s Terms of Use and Subscription Agreements, Microsoft will comply with applicable law to edit, refuse to post, or to remove any information or materials, in whole or in part, in Microsoft's sole discretion.

Postscript 5: It's important to note this case is not about stopping child porn. It's about trying to get a law passed that would help the government shut down sites that allow children themselves to access porn. To prove a need for the law, the US government wants to show how much porn children might encounter through searches. It's easy to confus