Remember when Google and Viacom were friends? Ah, those were the days. But not anymore. Over a year ago, Viacom filed suit against Google for the copyright infringment found on YouTube videos. In the latest plot point in the ongoing saga, U.S. District Judge Louis Stanton has ruled that Google can keep its source code secret, but must hand over user logs for the popular video sharing site.
Viacom says it wanted the code to prove that Google could use it to "purposely" find the content in question. Nice try, Viacom. Google's code, of course, is a trade secret. But it's almost a wonder the judge protected the code, because he ruled that Viacom can have access to the user logs. Data to be released includes user names, IP addresses, and videos watched.
Google has often defended its data collection, saying it's not a threat to privacy. It appears the argument worked a little too well on Judge Stanton.
For a history of the Google-Viacom battle, check out these links: Google Fights Back in Viacom/YouTube Copyright Suit Others Join YouTube, Google Copyright Lawsuit Viacom Would Rather Not Sue, Chief Counsel Claims Google to Viacom: Don't Turn YouTube into SueTube
Posted by Nathania Johnson at 10:52 AM | Permalink | Comments (0)
The U.S. House of Representatives passed the Prioritizing Resources and Organization for Intellectual Property Act.despite opposition from the Department of Justice.
The act, sponsored by Reps. John Conyers (D-Mich.) and Lamar Smith (R-Texas), would allow for forfeiture of property such as computers and other equipment used by convicted copyright infringers.
While this is mainly aimed at music and movie piracy and is backed by the entertainment industry, it will be interesting if it could be applied to website content theft. If so, this could create all sorts of interesting developments for the future of the web.
Scrappers and other copyright material thieves could be risking a lot more than dropped Google listings.
Posted by Frank Watson at 8:11 PM | Permalink | Comments (1)
A few hours ago, before I went to bed, I blogged about someone who stole my SEW column content for their own online marketing blog within hours of its publication. I also commented on their blog and asked them to remove the offending content.
This morning, things look different:
The reply starts with an excuse: "my blog has very little content as I am only testing it at the moment" - as if the perpetrator's low readership makes their actions justifiable. It goes on to say that the blogger "didn’t mean to leave your part copied article on the site". That's what a small child might be expected to say if caught doing something wrong. It further states "I was surprised to see that Google had indexed it". Isn't that why we blog in the first place? If the content was good enough to post on a public website to promote their business, there should not be any great surprise when Google picks it up.
I find all of this in keeping with the original theft of my content and do not view it to be a satisfying explanation.
However, by the end of the response the person had apologized, removed the stolen content, and promised not to do it again.
Here is what I have done for my part to edit my original post:
Posted by Tim Ash at 12:40 PM | Permalink
Globalization of Content Theft - A Personal Story[This entry has been edited since its original posting. Please read the follow-up post after reading this one.]
I worked for many years to build up my expertise and reputation in online marketing. I considered my recent addition as a "By The Numbers" expert columnist on Search Engine Watch as an honor. My first column on Landing Page Neglect just appeared yesterday and I wanted the whole world to see it.
Unfortunately at least one reader and member of the Global Village did more than that. An unscrupulous so-called online marketing expert in the U.K. stole my column and posted it on his on own blog.
I have asked him to remove it. In case he does, here is a screenshot of the original entry and my comment asking him to rectify the situation.
So what are we to do in this friction-free Internet world? If stealing is as easy as cut-and-pasting, and there is no legal or financial leverage over a thief who is in another country or legal jurisdiction, then what recourse do we have?
I think that one answer may be to use the medium itself against the offenders.
It is easy to steal. But it is also easier than ever to detect such theft, and expose it. It seems like the days of public shaming are a quaint relic of yesteryear. But I vote to bring them back. We should not tolerate liars and cheats in our midst and should use the very medium that enabled the transgression to help rectify it.
Do not do business with [Name edited out] of [Location edited out] - he is not an honorable man.
Please see my follow-up post.
Posted by Tim Ash at 2:04 AM | Permalink
Nielsen will release a service enabling broadcasters and cable networks to control and make money from their online video distribution (per today’s WSJ, subscription only). Through fingerprinting technology, the video may be blocked, permitted to load, or "perhaps load only if it is attached to a particular piece of advertising.”
This announcement makes me wonder who holds the keys to video-related ads. With Nielsen acting as a neutral party, I would like to believe the largest rights holders keep control of their ad sales and sources.
However, we can't predict new moves from social networks, such as YouTube. What if the network itself starts to block copyrighted clips, but you want to show your clips and ads? What if the network begins showing ads that somehow interrupt yours? What if you prefer to use the network's ad inventory after all?
Regardless of these unknowns, the Nielsen announcement is interesting news. We'll see who gets real traction in this "video cop" marketplace, and how they charge for or otherwise monetize their services.
Posted by Deborah Richman at 2:39 AM | Permalink
Search marketers will almost certainly run into copyright issues at some point in their careers. They may be the victim, finding their own optimized content duplicated without permission and showing up in targeted search results. Or they may be an infringer, stealing copyrighted content from others and finding themselves subject to penalties by the search engines and the courts.
Thankfully, most online copyright infringement issues can be handled with some simple legal procedures. In today's SearchDay, "Copyright Law: What Search Marketers Should Know (Part 1)," Grant Crowell outlines the basics of cease & desist letters, the Digital Millennium Copyright Act (DMCA), and other tactics to help search marketers protect their content.
Posted by Kevin Newcomb at 3:33 PM | Permalink
A Viacom spokesperson called me a few minutes ago, breaking this news, and sending along an official statement. Today, MTVNetworks and its parent company Viacom, are issuing an ultimatium to Google/YouTube: remove unauthorized content or else...
MTVNetworks/Viacom says that over 100,000 unauthorized clips of its video content – representing 1.2 billion video streams - appearing within Google and YouTube, must be removed immediately from its site.
The recent talk of adding short video ads ahead of content on YouTube may have been the last straw for MTV and Viacom, who clearly did not want Google to profit from showing unauthorized clips.
After months of ongoing discussions with YouTube and Google, it has become clear that YouTube is unwilling to come to a fair market agreement that would make Viacom content available to YouTube users. Filtering tools promised repeatedly by YouTube and Google have not been put in place, and they continue to host and stream vast amounts of unauthorized video. YouTube and Google retain all of the revenue generated from this practice, without extending fair compensation to the people who have expended all of the effort and cost to create it. The recent addition of YouTube-served content to Google Video Search simply compounds this issue. Virtually every other distributor has acknowledged the fair value of entertainment content and has taken deliberate steps to concluding agreements with content providers.We have great respect for and loyalty to our audiences. We host more than 130 authorized web sites where millions of fans visit and interact with our content. Our internet portfolio has more visitors than any other entertainment company and we are always seeking distribution relationships to ensure that any of our products and services are easily accessible on every platform.
Our hope is that YouTube and Google will support a fair and authorized distribution model that allows consumers to continue to enjoy our very popular content now and in the future.
Posted by Elisabeth Osmeloski at 10:55 AM | Permalink | Comments (0)
Seems the courts have taken the same view of trademark use in online advertsing as Google has, according to a report at TechDirt.
Eric Goldman's blog that TechDirt refers to stated "The court holds that, as a matter of law, the use of keyword-triggered ads and keyword metatags cannot confuse consumers if the resulting ads/search results don't display the plaintiff's trademarks".
Posted by Frank Watson at 6:32 PM | Permalink
The NY Times reports that Yahoo has recently rejected Google's subpoena for help with the Google Book Search project legal woes. Reportedly, Yahoo turned down Google's request for similar reasons mentioned by Amazon when they turned down the same request. If you are interested, I have posted the full court filing at my server as a PDF download.
Posted by Barry Schwartz at 9:11 AM | Permalink
This week, news emerged about an agreement between Google and two Belgian author groups that were suing it over copyright issues. Below, a short Q&A on what this means for Google. Highlights: The case goes on with three other groups taking part, but large damages seem unlikely. The new deal gives especially seems to give Google photo rights. Google says it is not doing an about-face on opt-out in Denmark. More about these an other issues covered below, based on a talk with Google spokesperson Jessica Powell. Plus, some bonus stats on how much traffic newspapers get from search engines.
Q. The case was originally filed against Google by Copiepresse. What are the other groups that joined and when did they come on?
A. In mid-October, Sofam, Scam, SAJ and Assucopie all joined the case after Google posted the Belgian court ruling in late September.
Q. Who remains as part of the case?
A. Copiepresse, SAJ and Assucopie.
Q. Has Google paid any fines in the case so far?
A. Despite rumors, Google reiterated again today that it has not been asked to pay any fines.
Q. If Google loses the case, will it have to pay any damages?
A. Google says it hasn't been asked to pay any fines.
Q. What do the new agreements with the author groups Sofam and Scam allow?
A. Sofam represents Belgian photographers while SCAM covers mainly audio/video content. Exact uses are being worked out. As with the AP deal, Google highlighted this as providing new uses rather than a solution to the legal challenges over spidering and thumbnail image use. "It's a way for us to use their content in new ways beyond what copyright law currently allows us without the permission of the authors," said Powell said.
Q. Was there a financial aspect to the agreement?
A. Google's not commenting. Google is definitely paying the Associated Press to use some of its content, as the AP itself has reported. However, the exact terms, mechanisms or amounts have never been disclosed. Google wouldn't get into specifics on the financial details on the two Belgian deals other than to say these were deals that will allow the search engine to use the content in new ways.
Q. Is Google talking with the other parties to the suit?
A. Google said it won't comment on discussions but that it's always open to dialogue.
Q. Did Google reverse course and go opt-in for Google News Denmark?
A. Google says it chose to only launch in Sweden and Norway and that going forward it is not planning on an opt-in model in Denmark or elsewhere. The reason, says Powell, is that the company believes Google News complies with copyright law. "If publishers don't want their websites to appear in search engines, robots.txt enables them to automatically prevent their content from being indexed. And we even go beyond that: if a newspaper doesn't want to be a part of Google News, they only need to ask, and we remove them."
Between The Lines Time
The use of news images is one of the touchiest areas for Google to deal with, as I covered more in my Search Engines, Permissions & Moving Forward In Copyright Battles article.
The Sofam deal might help solve some of Google's legal issues in Belgium. The group represents the rights of nearly 4,000 photographers in Belgium, Google said. Google did NOT say how this might translate into usage at Google News. However, potentially this means Google can have photos in Google News even from publication that it had to remove from Google Belgium by court order. The Sofam deal might provide legal cover there. Of course, if those publications are the only source of certain photos -- and they block use through systems like robots.txt -- that would still keep the content out of Google. I'm also following up more on this particular issue.
The deals do not restore access for Google to list textual news stories it finds. That means it has to remain hopeful that the legal case will go its way, if it wants to prevent some type of negotiations with the publishers that have opted-out.
If the case goes against Google, it doesn't appear to be facing in major damages. If these were to be levied, that should have happened when it lost the first time. Instead, the publishers will remain out of Google, making Google News Belgium less useful than it would be. However, they also deny themselves traffic from Google. Possibly Google might negotiate a payment-based system to include them. Equally possible, it might also decide to hold its ground and focus attention on other countries, to see if it can wait the publishers out.
If the case goes for Google, then it regain content that will help enhance Google News Belgium, unless those publisher decide to specifically block spidering, which Google would almost certainly honor.
Overall, the action in Belgium -- as with Denmark -- underscore that in smaller markets, Google (and other search engines) may come under increasing pressure to negotiate deals to list material. The players are fewer and have more power concentrated among them. Whether these will be lucrative deals remains to be seen. In smaller markets, Google might decide it's simply not worth figuring out some type of financial arrangement -- especially for Google News which carries no ads, so generates no direct revenue. That might bring about more non-financial arrangements where the publishers cooperate for the benefit of getting traffic and also being dealt with personally by Google, rather than impersonally through automated permissions systems like robots.txt
Traffic To Search Engines
As an aside, I got a request from another reporter trying to understand how much traffic newspapers get from search engines. My response:
There's no specific answer to this. It will vary from paper to paper. Places like the New York Times will likely get a lot, because they specifically work to generate search traffic. Papers such as those suing Google in Belgium are getting probably nil, since they were removed by court order from Google.
In general, surveys have found sites getting anywhere from 8 to 13 percent of traffic from search engines. That might not sound like much, but often the first visit leads to repeat visits.
I also included two people on my response who I thought might have some better stats. Marshall Simmonds, chief search strategist for the New York Times Company, came back with this:
The one stat I can report is the NYT gets approximately 22% of its traffic from search engines. This number is very actively growing.
Bill Tancer, over at Hitwise, reported this:
Hitwise tracks 800,000 sites divided into 170 industry categories. One of those categories is our News & Media – Print category which covers Newspaper and Magazine websites (3,180 sites total). For the week ending 11/18/06 (based on our U.S. sample), Google was the #1 site sending traffic to the category at 13.66%, Search Engines as a whole were responsible for 22.44% of traffic for that same week.
That's a lot of traffic, however you slice it. There's no doubt things like Google News help build Google up as a company. But at the same time, Google News drives a ton of traffic to newspapers that are seeing the web as a new revenue source that might save them as print subscriptions dry up.
Posted by Danny Sullivan at 12:35 PM | Permalink
Via Techmeme, news that Google has settled with two Belgian publishing groups involved in a lawsuit against it over content included in Google News Belgium. This comes a day after Google's legal case was reheard in an appeal. The settlement, following what seems a similar settlement with AP earlier this year, seems to open the door that Google is going to continue making such appeasements rather than fight cases in court.
Bloomberg reports that Google struck an agreement with Sofam -- which represents Belgian photographers -- and Scam, which represents Belgian journalists. The agreement allows for Google to use content from these groups (or from their members). Whether they are being paid for this, what content or how it will be used is not explained:
"We reached an agreement with Sofam and Scam that will help us make extensive use of their content," Jessica Powell, a spokeswoman for Google, said in a phone interview yesterday. She declined to give details of the agreement or say whether it involved paying the groups for the content, and declined to say whether Google, based in Mountain View, Calif., was considering similar accords with the newspapers.
In September, Google lost a copyright case filed against it by another Belgian publishing group, Copiepresse. Google later had to post the ruling against it on Google Belgium. However, Google was granted an appeal for the case to be reheard, as it hadn't been represented in court the first time. The stories below provide more background on all of this:
At some point, Sofam and Scam joined in the case. I see one reference to this back in October. Two other groups also apparently joined, since the Bloomberg report speaks to the settlement being with two of five total parties to the suit.
Those parties, led by Copiepresse, continue on in their action against Google. That action, as I've covered in my Google's Belgium Fight: Show Me The Money, Not The Opt-Out, Say Publishers article, is far more about trying to pressure Google into a financial arrangement to use Belgian news content than keeping that content out of Google itself. If it was just to keep content out of Google, the publishers could have easily done this through methods such as using robots.txt files.
Copiepresse seems confident of a legal victory:
Speaking on the phone from Brussels after the hearing, Margaret Boribon, the Copiepresse secretary-general, said she felt very happy with how things proceeded today. "I can't see how the judge could change his opinion,'' she said, certain that the court will uphold the September ruling.
Perhaps that legal victory will come, when the ruling is issued in late December or January, when expected. If so, it may not help Copiepresse in the real aim of a financial deal. Google may have enough content to make Google Belgium viable without the participation of the papers Copiepresse represents. They'd then be left in a situation of asking Google for reinclusion or going without the substantial traffic Google News can send web sites.
On the other hand, Google's settlement with the groups following on an agreement earlier this year with the Associated Press seems likely to fuel further publishing groups pushing for such arrangements, especially in smaller markets where key content is put out by a small set of publishers. Banding together and sticking with exclusion, they can severely hamper a news search service.
Norway Upset With Google News Over Copyright Laws covers how Google is being challenged in Norway. That hasn't developed into a legal case yet, but it's hard to see how Google's going to be able to say no to some type of agreement there. Pandia also covers how in Denmark, publisher opposition apparently created the unprecedented case of Google asking for permission to index news sites, rather than the normal case of spidering and requesting an opt-out.
Search Engines, Permissions & Moving Forward In Copyright Battles from me covers how in particular, Google's use of images for its news area is complicates issues and is making it harder for search engines in general to defend opt-out spidering, which I support. That article calls on Google to stop the inclusion of news images, as well as a pullback on showing cached pages and scanning of in copyright works without permission.
However, asking for permission to spider textual content for news search is likely to be as slippery a slope as cutting deals with publishers. It weakens the core legal position Google has argued over gather textual content from the web, most recently against suggested copyright changes in Australia that it said might make search engines unworkable.
As a reminder, Microsoft was also challenged in Belgium. Microsoft Removes Belgian Content Without Court Order covers this more and how Microsoft's reaction was to drop those publications. So far, it hasn't apparently cut a deal for reincluding them and perhaps may not feel a market need to do so.
Judge Gives AFP Case Against Google More Time covers how a copyright case against Google but Agence France Press over news inclusion is still ongoing.
I plan to follow up with Google Monday and see what further details I can gather on the case. I don't expect terms to be disclosed, but it would be good to know if a financial arrangement of some type was reached. That happened in the AP case, though Google was adamant the agreement there was not to allow it to solve a legal problem with spidering.
Many saw this as spin. There are other things the agreement would give Google aside from the right to spider, as my Google-AP Deal Not Pay-Per-Click & Some Further Details covers in more detail. However, it also conveniently solved the spidering issues for Google.
Postscript: See Q&A On Google's Belgium News Agreements for more on this story since it was written.
Posted by Danny Sullivan at 5:04 PM | Permalink
Reuters reports Google France was sued by Flach Film, a French film producer, for copyright infringement. They claim their video, "The World According to Bush," was published on Google Video France, and viewed more 50,000 times, before Google removed the video. The French film producer estimates $648,700 in prejudice but Google said "our terms and conditions specify that users (Internet surfers) don't have permission to use videos which they don't own the rights to."
Google has put away $200M for copyright case legal issues with the YouTube acquisition.
Posted by Barry Schwartz at 9:05 AM | Permalink
Google To Go To Belgium Court FinallyThe AP reports that Google is finally going to show up in court to present their side of the case in the Belgium copyright suit. Google has never showed up to fight the publishers and papers in Belgium the first time the case was heard.
Posted by Barry Schwartz at 8:58 AM | Permalink
Melanie Colburn writes that Music Labels Lose Copyright Suit Against Baidu, which started back when Five Music Companies Sue Baidu in September of 2005. Baidu was previously ordered to stop these music downloads but it appears the ruling was overturned because all Baidu is providing are links to 3rd party sites that facilitate the music downloads, whereas Baidu does not participate in the downloads themselves. More details at the BBC News.
Posted by Barry Schwartz at 9:37 AM | Permalink
Pandia reports that Google News is in trouble again over copyright laws overseas. Google News Norway was launched and publishers are upset that Google is placing copyrighted images in the Google News home page. Mediebedriftenes Landsforening, an association of Norwegian media companies, claims Google "cannot make use of photographs without a proper agreement." This form of syndication is in "violation with Norwegian copyright law," says Dagens Næringsliv.
Google is also in trouble over copyright issues in Belgium (also see here and in Australia.
Posted by Barry Schwartz at 9:04 AM | Permalink
First Google was rumored to be keeping $500 million back from the YouTube sale to settle possible legal problems. Then Google CEO Eric Schmidt said they weren't. Today, turns out they are. Google holds back stock in YouTube deal from the Associated Press covers the details about keeping 12.5 percent of the stock swap for one year "to secure certain indemnification obligations." What Eric Schmidt Meant When He Said Google Wasn't Holding $500 Million From YouTube For Lawsuits: We're Holding $200 Million from TechDirt does a summary, plus gives you a funny headline about the entire thing.
Posted by Danny Sullivan at 10:12 AM | Permalink
A Struggle Over Dominance and Definition is good New York Times article out today that looks at Google and whether it is a media company that conflicts with other media owners, especially in terms of using content from others without permission. It also sparked me to finally finish a long piece I've been meaning to do on Google, search engines and copyright issues. Search Engines, Permissions & Moving Forward In Copyright Battles is now up over at my personal blog Daggle, covering the important difference between indexing and reprinting, how robots.txt already provides a permissions system, why Google should stop scanning in-copyright books and also be a leader in dropping cached pages.
Posted by Danny Sullivan at 11:58 PM | Permalink
AFP reports that Google has warned Australia that if they pass certain a new copyright law that it will set the country back to "the pre-Internet era." Google's senior counsel, Andrew McLaughlin, told the Senate Legal and Constitutional Affairs Committee, "If such advanced permission was required [to index pages], the internet would promptly grind to a halt." I believe the issue here is that Australia wants Google to get copyright owners to opt in to having their content indexed, archived and cached, as opposed to opting out via a robots.txt file. Australia is not alone here; Belgium newspapers are fighting Google over similar copyright issues. This all just amazes me, seriously.
Postscript From Danny: See also my Google's Belgium Fight: Show Me The Money, Not The Opt-Out, Say Publishers piece that goes into great depth about how this is effectively already the law in Belgium, due to a court ruling there. The appeal on that case will happen later this month, but the threat alone also already caused Micrsoft to back out of some indexing.
Posted by Barry Schwartz at 10:29 AM | Permalink
The Financial Times reports that Eric Schmidt's Google is running from media company to media company trying to offer upfront cash, in sums of "tens of millions of dollars," to slow and "halt" the threat they pose to YouTube. FT.com says that Schmidt met with CBS, Viacom, Time Warner, NBC Universal, News Corp and others recently. There are some more details over at paidContent.
Posted by Barry Schwartz at 9:27 AM | Permalink
Elinor Mills reports that Google has denied a report last week that it was fined $43 million for not removing all Belgian publishers' content from the engine's index and cache. Google spokesman Ricardo Reyes, told Elinor Mills at News.com in an email, "Google has complied with the Copiepresse judgment and we are not aware of any fine. We believe this story to be completely untrue."
Posted by Barry Schwartz at 8:17 AM | Permalink
Gary Price points to a Poynter.org report showing that Google has been fined €34 million (about $43,231,000 USD) for not removing all of the Belgian publisher's content based on a court ruling. Google claims they could not find all the publishers and asked the publishers for help in identifying the content that has to be removed.
Postscript: Google Says Belgium Did Not Receive $43.2M Fine.
Posted by Barry Schwartz at 9:42 AM | Permalink
More Details On YouTube & Google AcquisitionBlog Maverick has some intimate details on the Google YouTube Deal from a "trusted anonymous author" in a message board. Here are some of the excerpts:
The first request was a simple one and that was an agreement to look the other way for the next 6 months or so while copyright infringement continues to flourish. The second request was to pile some lawsuits on competitors to slow them down and lock in Youtube's position. Infringement lawsuits will be served on Youtube and the new proud parent Google in the coming months. Google will respond with two paths: an expensive legal fight or a quick and easy settlement with most choosing the latter.Posted by Barry Schwartz at 9:26 AM | Permalink
The Register writes Microsoft dodges court in Belgian copyright battle where they say Microsoft decided not to go to court over Belgian newspapers request for them to remove their content from their index. Google was ordered to remove the content by a Belgian court and then later lost an appeal on the same case. Microsoft simply did not want to fight them and decided to just grant the wishes of the cease and desist letter sent to them.
Posted by Barry Schwartz at 9:08 AM | Permalink
Amazon Turns Down Google's Request For Information On Book SearchBusiness Week reports that Amazon has turned down Google's request for information to help in it book scanning lawsuit. Amazon responded to Google's subpoena saying, that it would make Amazon's trade secrets public and it was "overly broad and unduly burdensome" on Amazon. In short, it is Amazon's way of telling Google to stop looking over their shoulder and work it out yourself.
Posted by Barry Schwartz at 8:29 AM | Permalink
The NY Times has an extensive article today on Google and those who would challenge it in the courts. It offers a broad overview of the legal issues surrounding Google, including those coming with the YouTube acquisition, and the company's attitude toward litigation, which is typically to fight rather than settle.
In addition, Charles Cooper at CNET writes what can only be described as an angry column about Google and "Web 2.0," content and copyright infringement. The article is entitled, "Web 2.0 as a metaphor for 'rip off'.
Posted by Greg Sterling at 12:22 PM | Permalink
Reuters reports that YouTube erased 29,549 films and media files after receiving a complaint from "Japanese media companies over copyright infringement." Around the same time, the NY Times informs us that Music Companies Grab a Share of the YouTube Sale. The article says that the $50 million earned from this deal "should help to shield Google from copyright-infringement lawsuits." Universal Music last week sued two smaller video sharing sites but not YouTube, for distributing pirated music and videos. Techdirt feels that the last minute deal with the music companies before Google buying them was YouTube basically handing over to "the labels Google's cash before any official deal was completed."
Posted by Barry Schwartz at 8:18 AM | Permalink
MarketWatch reports that a judge has consolidated two different cases against Google to make the process quicker and more "streamlined." Book publishers and book authors have joined together to battle Google on the legal from for copyright infringement allegations over Google's Book Search Project.
Postscript: Steve Bryant at eWeek reports that the Authors Guild v. Google case is postponed six months to January 2008. Steve said, "Doesn't that mean that Google, in the meantime, will continue to operate Google Books as normal, which is exactly what the Authors Guild wants to prevent?"
Posted by Barry Schwartz at 2:15 PM | Permalink
Earlier, we touched on the fact that Copiepresse was threatening to go after MSN for carrying Belgian newspapers in the way it went after Google. Via PaidContent.org, Update: MSN is latest target of Belgian copyright complaint from InfoWorld covers how Copiepresse is now negotiating with MSN Belgium after sending a cease-and-desist letter to MSN. Copiepresse hopes to gain a share of advertising revenue.
Meanwhile, MSN Belgium has removed some newspapers. Removed from where isn't clear. MSN Belgium does have a dedicated news area, so it might be from there. However, sites may also have been removed from web search results similar to what Google did. I tried a search for site:lesoir.be, and the main news site seems to have been removed.
InfoWorld also notes:
The group, which represents some of Belgium's best known newspapers, including Le Soir and Le Libre, has been gathering more support for its cause. It was joined this week by separate groups that represent Belgian photographers, journalists, scientific authors and multimedia publishers, who plan to back its efforts.
It will be interesting to see how many more groups they rally in support against the search engines, and how the search engines react. I think there's a big difference between search engines deciding they might pay to include relatively small amounts of content in specialized news search engines versus a frankly insane idea that they're going to negotiate deals for inclusion in regular web search results.
Ultimately, the good people of Belgium might mind themselves without the ability to search the web, should Copiepresse succeed in its quest that getting permission via robots.txt should be illegal.
I've have much more to say on this subject -- I'm working on a piece I hope to post later this week. For some related material from me, see:
Posted by Danny Sullivan at 8:56 AM | Permalink
Google faces copyright fight over YouTube from The Guardian cover how chair and CEO of Time Warner Dick Parsons said his company plans to go after YouTube for copyright violations. It's still talk rather than legal actions:
Mr Parsons told the Guardian: "You can assume we're in negotiations with YouTube and that those negotiations will be kicked up to the Google level in the hope that we can get to some acceptable position."
I'm sure it will get kicked up. And it shouldn't be hard to get the right people connected given that the AOL part of Time Warner already has an existing distribution deal with Google. Of course, if that fails, it should be interesting to see if Time Warner sues a copy that has a five percent ownership stake in AOL. Related coverage and commentary can be found via Techmeme, here.
Posted by Danny Sullivan at 8:35 AM | Permalink
Sean Daly, from Groklaw, interviewed Margaret Boribon of Copiepresse on September 28th about their copyright lawsuit against Google, which targets the use of Belgian news in Google News, and cached copies of those articles. He has posted their discussion, in English and French, as well as some commentary and analysis of the litigation, including some late breaking news involving demands made by Copiepresse for MSN, and a potential new plaintiff.
I've written a brief synopsis of some of the points she raises in the interview at SEO by the Sea. Danny also talked with Margaret Boribon earlier in September.
Posted by Bill Slawski at 12:32 PM | Permalink
Ballmer: YouTube Overvalued & Google Transferring Wealth From Content OwnersThe Web According to Ballmer from BusinessWeek has Microsoft CEO Steve Ballmer questioning the value of the Google-YouTube deal and oddly warning that Google is transferring wealth away from rights holders. It's an odd statement, since that's what Microsoft wants to do as well.
First the questioning of the YouTube value:
[You've got to ask] could Google do whatever it is they're hoping to buy without paying $1.6 billion? Is YouTube really some permanent, long-term thing, or is it a fashion?....Right now, there's no business model for YouTube that would justify $1.6 billion.
Though strangely, when BusinessWeek tries to pindown what seems a clear statement that Google overpaid, Ballmer says:
I'm not saying it is overvalued. I'm not trying to say that. It depends on a set of factors. I'm not saying I wouldn't write a check for that amount of money. I might.
And back to the controversial statement about Google's relations with content:
And what about the rights holders? At the end of the day, a lot of the content that's up there is owned by somebody else.
The truth is what Google is doing now is transferring the wealth out of the hands of rights holders into Google. So media companies around the world are all threatened by Google. Why? Because basically Google is telling you how much of your ad revenue you get to keep. They better get some competition. Us. Yahoo! (YHOO). Somebody better break through or you can short all media stocks right now. As long as there are two, you can hold onto media stocks. Google understands that. And that's one reason why they're willing to lose money up front.
Microsoft has its own video sharing service up, Soapbox. It has a question answering service, Q&A. It has an entire search engine that crawls the web like Google, Windows Live. Microsoft has plans for contextual placement of ads on pages, similar to AdSense. It's specific to MSN content now, but that will inevitably change. All of these things leverage the content of others in order to make money from Microsoft. So if these actions leverage wealth away from content owners, Microsoft is just as guilty of it as Google.
Frankly, all Ballmer seems to be saying is content owners would be better off if Microsoft was a strong third participant in ad game. Sure -- but let's not kid ourselves. Microsoft gets a lot better off by that as well, and it didn't jump into the game out of some desire to counter-balance the power of Google. It's in it to make as much money as it can, as well.
Posted by Danny Sullivan at 7:42 AM | Permalink
Just in from Bloomberg, Google to Subpoena Yahoo, Microsoft on Book Scanning covers how Google hopes that gaining information from rival book scanning programs will help it defend itself in copyright lawsuits over its own scanning program. From the story:
Google, which doesn't disclose how many books it has scanned, also wants to know the title, authors and copyright status of books already offered through competitors' book projects, according to the documents.
The right to subpoena has been granted, but information is to be kept confidential and used only in the litigation.
Posted by Danny Sullivan at 8:09 PM | Permalink
News.com has another great article named Copyright tussles for Google. It reviews some of Google's copyright cases and how Google is trying hard to win some of those cases for their current and future projects. From the Google Cache, to Google Images, to web search, book search and other indexing projects - Google needs to keep redefining the law to continue to build out their search engine. But you have to agree with the highlighted quote, "One of the challenges is, 'This is Google. What would the world be without Google?' We don't want the world without Google. We want the world without Google infringing our copyrights."
Posted by Barry Schwartz at 9:02 AM | Permalink
Last week, Google complied with a Belgian court order and posted the ruling against it in a copyright suit on the home page of Google Belgium and Google News Belgium, along with many other places including many search results pages. Now via Google Blogoscoped, news that the plaintiff in the case Copiepresse thinks the ruling should have gone at the top of the Google News Belgium page, rather than the bottom.
An article about the issue in Dutch is here. I don't speak Dutch, sadly, consigning me to AltaVista Babelfish, which translated a key part as:
That happened also, but on the start page of Google news, the topicality part of the site, stands the sentence entirely below. And that does not like Copiepresse.
Anyone hitting Google Belgium couldn't have failed to notice the beginning of the very long ruling, as the illustration above shows. But over at Google News Belgium, that ruling wouldn't have been seen unless you scrolled to the bottom of the page, past all the stories. That's what Copiepresse seems to be upset about.
The order did require that:
The defendant to publish, in a visible and clear manner and without any commentary from her part
Copiepresse might well be able to argue that on Google News Belgium, the ruling there wasn't clear and visible by being at the bottom of the page.
Of course, putting the long ruling at the top of the page would have been unworkable. The ruling itself didn't allow Google to put anything on the page directing people to see the notice at the bottom since that might have been deemed "commentary" about the ruling.
What next? If Copiepresse presses for more and wins, perhaps Google might have to run the ruling in a column alongside news content.
Frankly, Copiepresse comes across as petty in complaining here. Google already had a good argument that publishing the ruling was unnecessary given the wide press coverage the ruling had gained, though the court was not convinced and required the ruling to go up anyway. After that happened, coverage of Google's loss was only magnified. The point was made very publicly.
Posted by Danny Sullivan at 9:22 AM | Permalink
Our approach to content at the Official Google Blog has Google explaining to the world how it works with content owners and its desire to respect their rights.
In terms of copyright, Google stresses that it generally sticks to what's known as fair use, though the post doesn't use those words. The idea is that it shows very short summaries of stories, pages, thumbnails of images but doesn't reprint this material, requiring people to clickthrough to the actual material from places like Google News.
Of course, in the case of cached pages, many including myself would argue that Google goes beyond fair use. Cached pages are an example where content can be viewed without clicking through to the original site, and the opt-out approach for that doesn't feel appropriate at all.
Google also notes there are cases when it wants to go beyond fair use, to make broader use of content where permission would be required. The deal with the Associated Press is cited as one of several examples here.
To me, this is also a way for Google to help defuse the idea that some publications have, such as the Belgian newspapers recently, that Google can be bought off to avoid lawsuits. To me, this is Google stressing that it will do content deals in some cases, but that these content deals aren't necessarily being done to avoid lawsuits, especially when it feels it is acting within fair use guidelines. That's my speculation and take on this, of course. Google didn't comment when I asked if this was the reason for raising the AP deals.
Moving past Google saying it respects copyright, it then stresses that it allows people to opt-out, even if it feels it has fair use rights. In general, I agree with this method, which Google along with the other major search engines generally follow. Trying to get permission from each web site to index it would be an impossible task, and one that's not necessarily even legally required. Opt-out through things like robots.txt is an effective way to protect rights holders plus benefit the public as a whole. I do hope they'll change cached pages to opt-in, however.
Google talked with me about the post shortly before it went live yesterday, to see if I had any questions. The main thing in my mind was if this was in response to the Belgian lawsuit. No, I was told. The post has been in the works for some time, apparently. Google's hoping it will help people better understand their approach to content.
Posted by Danny Sullivan at 7:56 AM | Permalink
Just a quick note that Google's posted on its official blog about the Google Belgian news issue that I've been covering, while William Slawski has a nice translation in the works on the ruling itself.
About the Google News case in Belgium from the Official Google Blog doesn't really provide much new information that you haven't already gotten in reports from me and others. What should it provide? How about answers to:
The post does stress that there are ways for publishers to easily stay out of Google. Those ways don't appear to have been presented to the court itself. Writes William Slawski in Belgian Copyright Ruling Against Google News:
I’m surprised by the lack of mentions of the use of a noarchive meta tag or noindex meta tags or by the use of robots.txt to disallow Google from indexing or archiving the pages of the newpapers in question.
While the Court does note that the onus of keeping copyright from being infringed falls upon the owner of the technology used to take text from the newspapers in question, this seems like an omission worth noting.
Regardless of how the Court may have felt about those options, I think that they should have been addressed in some manner. The failure to do so makes it appear that they either weren’t provided information about those by their expert, or didn’t understand them, or may not have addressed those issues on purpose.
A simple noarchive tag would have kept information on those pages from being cached by Google. A noindex tag or disallow directive should have kept their pages from being indexed at all by Google. Were they using these and Google ignored them? I suspect that they weren’t.
After some more analysis, including an important argument over whether Google is a portal competing with newspapers or a search engine (answer, in my view, probably both depending on whether you keyword search Google News or read by browsing), he provides a long and what seems fairly complete English translation of the French-language ruling.
For more background on the case, see my prior posts:
Posted by Danny Sullivan at 9:03 AM | Permalink
Google has now posted the text of a Belgian ruling finding it violated copyright on the Google Belgium home page. The ruling has also been posted to the home pages of Google Images Belgium, Google News Belgium but not Google Groups Belgium.
Last week, a court ruled Google had violated the copyright of several Belgium newspapers by listing them within Google News. The court ordered the removal of those papers from Google, which the company quickly complied with.
The court also ordered Google to post the ruling on its Belgian web site within 10 days or face a heavy fine. Google appealed that punishment, but it was upheld last Friday.
Despite losing its appeal, Google looked ready to defy the order to post the ruling and take the fines, until a second appeal could be heard in November. Now, the company has reversed course. The ruling went up on Saturday. The company gave no reason for the reversal to Reuters:
A spokesperson for Google declined to elaborate on the reasons that made the company change its mind but said it would seek to cancel the ruling.
"We are pleased that a judge has given Google the opportunity to appeal the substance of this case. This will be heard in November," the spokesperson said.
From Dow Jones newswire:
Google spokeswoman Rachel Whetstone told Dow Jones Newswires the company had agreed to publish the ruling on its Web site after studying the court judgment.
Technically, Google never failed to comply with the court ruling. It has 10 days from receipt of the ruling to act, and it has done so within that time, saving it exposure to fines. As noted, a second appeal on the ruling will happen in November.
Past coverage is below:
Also, I note that Microsoft's Windows Live is now operating illegally under Belgian law. For example, site:www.lesoir.be shows how pages from Le Soir -- one of the publications involved in the lawsuit against Google -- has pages listed in Windows Live, as well as cached pages. In fact, here's an example of an article from Le Soir about the Belgian ruling against Google that I can read at Windows Live through its cached copy. To date, no news that Microsoft is about to be sued.
Finally, over at Threadwatch, an interesting comment points out that Google might have been OK in Belgium if it didn't show cached copies of pages:
The truly critical essence of this Belgian court ruling concerns Google's caching functionality. Here, protected content is being displayed a) in modified form; b) more often than not in its entirety (i.e. not restricted to mere snippets); and c) without copyright holders' permission. In most countries this would be viewed as a flagrant violation of copyright law - and obviously this is the stance the Belgian court has adopted. (And yes, there's been a contrary ruling by a US court, but that specific case seems to be rather more complicated on closer view; also, there's some indication that it was decided on arguably faulty assumptions, but that's another story.)
It is interesting to note that the Belgian ruling specifically acknowledges Google's right to store third party content (no mean concession, that, and far from self-evident) for search purposes only. But displaying it in the cache for everyone to see constitutes an act of re-publication which, like it or not, demands copyright holders' express permission.
This is a very important point. Search engines make copies of pages in order to make content searchable, as my Indexing Versus Caching & How Google Print Doesn't Reprint article explains in more detail. It's very difficult to argue this type of copying harms a site owner, especially when opting out is so easy.
Showing these actual copies through cached pages has long been disturbing for many people. While it's easy to opt-out of such display, it feels a step beyond what a content owner should have to do. With cached pages, content is literally being reprinted rather than made searchable. It seems absurd for the content owners to opt-out in that instance.
Within the US, cached copies has so far been upheld, something I disagree with. But if Google were to eliminate them -- along with picture thumbnails -- it sounds like it might have a better chance of winning in Belgium.
Posted by Danny Sullivan at 5:51 AM | Permalink
Google loses appeal on posting court ruling from Reuters covers Google losing an appeal that it should not be required to post the ruling of a Belgian court over a copyright infringement lawsuit on its Belgian web search and news sites. It now will be fined 500,000 euros per day for each day it fails to comply. Google has a further appeal on the entire case, including posting the ruling, that will be heard in November. My past article Google's Belgium Fight: Show Me The Money, Not The Opt-Out, Say Publishers has more about that and the entire case.
Posted by Danny Sullivan at 12:10 PM | Permalink
Publisher Groups To Test New Search Engine Rights Management System (Updated)Several mostly print publisher groups say they are to test a new "Automated Content Access Protocol" that they feel will head off conflicts with search engines. A release with more information is below.
Exactly how the system will work, why it is different or better than existing systems like robots.txt or meta robots tags, isn't explained. More details are promised to be unveiled at the Frankfurt Book Fair on October 6.
I'm planning to talk with the World Association Of Newspapers to learn more about their plans next week, so I may have more before the formal unveiling. I've had a very informal talk already, and the view seems to be to find a way to make the existing systems work better. That's appreciated, and it's something the search marketing community has long wanted. But it's something I hope will involve more than just a group of publishers with mostly print interests.
My Google's Belgium Fight: Show Me The Money, Not The Opt-Out, Say Publishers article from earlier this week explains how in my view, the entire issue that has erupted in Belgium is less about keeping content out of search engines and more about trying to force them to pay publishers for inclusion. Right now, any publisher that feels copyright is somehow infringed by being in a search engine has a very easy, very selectable way to keep whatever they want out: robots.txt files or meta robots tags. These work on a web-wide basis, have support of all the major search engines, plus have been used by users from publishers of all types. They could definitely be improved -- but in the Belgium case in particular, using them would have solved the exact problem that was raised.
Here's the release:
GLOBAL PUBLISHERS HEAD OFF LEGAL CLASH WITH SEARCH ENGINES: NEW RIGHTS MANAGEMENT PILOT IMMINENT
In the week that the publishers of Le Soir and La Libre Belgique won their case in the Belgian Courts against Google for illegally publishing content on its news service without prior consent, the World Association of Newspapers (W.A.N.), the European Publishers Council (E.P.C.) the International Publishers Association (I.P.A.) and the European Newspapers Association (E.N.P.A), are preparing to launch a global industry pilot project that aims to avoid any future clash between search engines and newspaper, periodical, magazine and book publishers.
The new project, ACAP (Automated Content Access Protocol), is an automated enabling system by which the providers of content published on the World Wide Web can systematically grant permissions information (relating to access and use of their content) in a form that can be readily recognised and interpreted by a search engine “crawler”, so that the search engine operator (and ultimately, any other user) is enabled systematically to comply with such a policy or licence. Effectively, ACAP will be a technical solutions framework that will allow publishers worldwide to express use policies in a language that the search engine’s robot “spiders” can be taught to understand.
Gavin O’Reilly, Chairman of the W.A.N., said: “This system is intended to remove completely any rights conflicts between publishers and search engines. Via ACAP, we look forward to fostering mutually beneficial relationships between publishers of original content and the search engine operators, in which the interests of both parties can be properly balanced. Importantly, ACAP is an enabling solution that will ensure that published content will be accessible to all and will encourage publication of increasing amounts of high-value content online. This industry-wide initiative positively answers the growing frustration of publishers, who continue to invest heavily in generating content for online dissemination and use.”
Francisco Pinto Balsemão, Chairman of the E.P.C., said: “ACAP will unambiguously express our preferred rights and terms and conditions. In doing so, it will facilitate greater access to our published content, making it more, not less available, to anyone wishing to use it, whilst avoiding copyright infringement and protecting search engines from future litigation.”
ACAP will be presented in more detail at the forthcoming Frankfurt Book Fair on 6th October and will be launched officially by the end of the year. W.A.N., the E.P.C. and I.P.A. will run the pilot for a period of up to 12 months and it will be managed by Rightscom Ltd.
===
The European Publishers Council is a high level group of Chairmen and CEOs of European media corporations actively involved in multimedia markets spanning newspaper, magazine and online database publishers. Many EPC members also have significant interests in commercial television and radio.
The World Association of Newspapers groups 72 national newspaper associations, individual newspaper executives in 100 nations, 13 news agencies, and nine regional press organizations, representing .more than 18,000 publications in all international discussions on media issues, to defend both press freedom and the professional and business interests of the press. The International Publishers Association is a Non Governmental Organisation with consultative relations with the United Nations. Its constituency is of book and journal publishers world-wide, assembled into 78 publishers associations at national, regional and specialised level. The European Newspaper Publishers’ Association – is a non-profit association currently representing 5 100 national, regional and local newspapers. These daily, weekly and Sunday titles are published in 24 European countries where ENPA’s members are operating in their national markets.
Postscript: I've just received this briefing paper that explains more. I've skimmed it and attached one note marked in bold. Basically, the existing robots.txt or meta robots systems can do a lot of what's already described here. What they cannot do is help search engines access content because the publisher allows this only through a licensing agreement, something the Belgian publishers seem to want. In addition, the pilot can do all it wants. Unless some major search engines agree to cooperate, the pilot will go nowhere. Again, I'll follow up more on this next week after talking with the groups involved.
ACAP Automated Content Access Protocol A briefing paper for publishers on a project in planning 1 Executive summary
All sectors of publishing face a “search engine dilemma”. The value of search engines to users – and to those who publish on the network – is incontrovertible. However, search engine activities can be very damaging to specific online publishing models. The undifferentiated model of permissions management (essentially either allowing or forbidding search of content) is inadequate to support the diverse present and future internet strategies and business models of online publishers.
At the beginning of 2006, the major publishing trade associations established a Working Party, chaired by Gavin O’Reilly, Chairman of the World Association of Newspapers, to consider the issues that this has raised. As a result, the World Association of Newspapers and the European Publishers Council are planning a project which will develop and pilot a technical framework which will allow publishers to express access and use policies in a language which the search engine’s robot “spiders” can be taught to understand. This will make it possible to establish mutually beneficial business relationships between publishers and search engine operators, in which the interests of both parties can be properly balanced.
The project is provisionally called ACAP (for Automated Content Access Protocol). ACAP will develop and pilot a system by which the owners of content published on the World Wide Web can provide permissions information (relating to access and use of their content) in a form in which it can be recognised and where necessary interpreted by a search engine “crawler”, so that the search engine operator (and perhaps, ultimately, any other user) is enabled systematically to comply with such a policy or licence.
This paper is intended to brief publishers on the outline of this project and to encourage their active support and participation when the project is launched in September 2006.
2 Background – the “search engine” problem
At the beginning of 2006, the major Europe-based publishing trade associations – including the World Association of Newspapers (WAN); the European Publishers Council (EPC); the European Newspaper Publishers Association (ENPA); the International Publishers Association (IPA); the European Federation of Magazine Publishers FAEP); the Federation of European Publishers (FEP); the World Editors Forum (WEF); the International Federation of the Periodical Press (FIPP) and Agence France Presse – established a Working Party to consider the issues that are posed by search engines for publishers, and to look at ways in which mutually beneficial relationships can be established between publishers and search engine operators, in which the interests of both parties can be properly balanced.
All sectors of publishing have a “search engine dilemma” (even if we disregard the particular problems that book publishers have with mass digitisation programmes). Search engines are an unavoidable and valued port of call for anyone seeking an audience on the internet. Search engines sit between internet users and the content they are seeking out and have found brilliantly simple and effective ways to make money from that audience. They have become so dominant that no individual website owner is large enough to have any serious impact on their commercial fortunes.
The benefits of powerful search technology to both users and providers of content are well recognised by publishers – although even “mere” search functionality can have a negative impact on some publishing business models. At the same time, publishers are aware that search engines are, in following their business logic, inevitably and gradually moving into a publisher-like role, initially merely pointing, then caching and, finally, aggregating and “publishing” and perhaps even creating content themselves, while using publishers’ content at will.
In the current state of technology, there can be none of the differentiation of terms of access and use which characterises copyright-based relationships in publishing environments, whether electronic or physical. The search engines can and do reasonably argue that, since their systems are completely automated, and they cannot possibly enter into and manage individual and different agreements with every website they encounter, there is no practical alternative to their current modus operandi.
Whether this (technological and political) gap is there by design or by accident, the search engines are able to make their own rules and decide for themselves whose interests are worth considering.
If publishers are to take the initiative in establishing orderly business relationships with the search engine operators, the response must be to help them to address the problem, both to fill the technical gap and ensure its political implementation. To paraphrase the former copyright adviser to the UK Publishers Association Charles Clark’s famous claim that “the answer to the machine is in the machine”, the challenges that are created by technology are best resolved by technology. Since search engine operators rely on robotic “spiders” to manage their automated processes, publishers’ web sites need to start speaking a language which the operators can teach their robots to understand. What is required is a standardised way of describing the permissions which apply to a website or webpage so that it can be decoded by a dumb machine without the help of an expensive lawyer.
In this way, one of the search engines’ most reliable rationalisations of their “our way or no way” approach will have been removed, and a structure which embraces and supports the diverse present and future internet strategies and business models of online publishers will have been created.
As a result of the work of the Working Party, a proposal was made to develop a permissions based framework for online content. This would be a technical specification which would allow the publisher of a website or any piece of content to attach extra data which would specify what use by search engines was allowable for that piece of content or website. The aim will be for this to become a widely implemented standard, ultimately embedded into website and content creation software.
Following the commissioning of a brief feasibility study, WAN and EPC have taken the initiative to establish a project to develop and pilot this framework to express publishers’ access and use policies. A detailed plan for this project – provisionally called ACAP (for Automated Content Access Protocol) – is currently in development.
This paper is intended to brief publishers on the outline of this project and to encourage their active support and participation when the project is launched in September 2006.
3 ACAP – the vision
ACAP will develop and pilot a system by which the owners of content published on the World Wide Web can provide permissions information (relating to access and use of their content) in a form in which it can be recognised and where necessary interpreted by a search engine “crawler”, so that the search engine operator (and perhaps, ultimately, any other user) is enabled systematically to comply with such a policy or licence. Permissions may be in the form of
• policy statements which require no formal agreement on the part of a user • formal licences agreed between the content owner and the search engine operator. There are two distinct levels of permissions which need to be managed within this framework: • The permission given to the search engine operators for their own operations (access, copy and download, cache, index, make available for display) • The delegation of rights given to the search engine operators to grant permissions of access and use to search engine users (search, access, view, copy, download, etc)
Although these can be managed within the same framework, it is important that the differences between them are recognised.
4 Use Cases
We include two informal Use Cases which are illustrative of the type of challenge that we seek to solve through ACAP.
4.1 USE CASE A: NEWSPAPERS
Newspaper publisher A would like all search engines to index his site, but only search engines X, Y and Z may display articles (because they have paid a royalty) on their news pages, and then only for 30 days. All images must be fully attributed as they are in the newspaper. The newspaper publisher uses articles syndicated by other newspapers and news agencies and cannot grant permission for those items, to the extent of the third party rights. Articles should not be permanently cached.
NOTE FROM DANNY: Using existing systems, publishers privileged enough to be included in news search engines don't have their articles displayed. They have links to those articles displayed, along with a description, something that people do all over the web and is generally accepted as fair use. Specific search engines can be blocked, if that's the desire. Specific images can also be blocked. Publishers can require those reprinting their content to install blocks as well.
4.2 USE CASE B: BOOKS
Book Publisher B invites search engine operators X, Y and Z to index the full text of his latest college text books. The web site where the full text is stored should not be made visible to search engine clients. He wishes that search engine users can browse only 2 pages of a maths book, but 20 pages of a philosophy text book. Search engine users should be able to buy individual chapters for private use, at $5 and $3 per chapter respectively.
5 Business requirements
Although it will be an integral part of the ACAP project to further develop and confirm the business requirements of publishers for the operation of the framework, significant progress has already been made in identifying the high level business requirements against which any technical solution must be measured. In summary, the solution must be:
• enabling not obstructive: facilitating normal business relationships, not interfering with them, while providing content owners with proper control over their content • flexible and extensible: the technical approach should not impose limitations on individual business relationships which might be agreed between content owners and search engine operators; and it should be compatible with different search technologies, so that it does not become rapidly obsolete. • able to manage permissions associated with arbitrary levels of granularity of content: from a single digital object to a complete website, to many websites managed by the same content owner • universally applicable: the technical approach should initially be suitable for implementation by all text-based content industries, and so far as possible should be extensible to (or at the very least interoperable with) solutions adopted in other media • able to manage both generic and specific: able to express default terms which a content owner might choose to apply to any search engine operator and equally able to express the terms of a specific licence between an individual search engine operator and an individual content owners • as fully automated as possible: requiring human intervention only where this essential to make decisions which cannot be made by machines • efficient: inexpensive to implement, by enabling seamless integration with electronic production processes and simple maintenance tools • open standards based: A pro-competitive development open to all, with the lowest possible barriers to entry for both content owners and search engine operators • based on existing technologies and existing infrastructure: wherever suitable solutions exist, we should adopt and (where necessary) extend them – not reinvent the wheel
The approach taken should also be capable of staged implementation – it should be possible for initial applications to be relatively simple, while providing the basis for seamless extension into more sophisticated permissions management.
Although the scope of the project is initially limited to the relationship between publishers and search engine operators, a framework which meets these requirements should be readily extensible to other business relationships (although details of implementation would not be the same in every case).
6 The Pilot Project
The ACAP pilot project is expected to last for around 12 months. In outline, it anticipated that the project will: • confirm and prioritise the business and technical requirements with the widest possible constituency: agreement with all stakeholders is essential if the project is to succeed in the long term • agree which specific Use Cases should be implemented in the pilot phase of the project, starting with a relatively simple approach • develop the elements of the technical solution: it is anticipated that this will primarily involve the development of standards for policy expression, although it will also be necessary to develop the tools for the implementation of those standards • identify a suitable group of organisations willing and able to participate in the pilot project; it is currently anticipated that this could involve four or five publishers and one of the major search engines; participants will need to be in a position to dedicate technical and time resources to the project to enable it to succeed • pilot the standards and the tools, to prove the underlying concepts In parallel with the development of the technical solution, a significant stream of project work will involve the development of a sustainable governance structure to manage and extend the standards (and any related technical services) which will be needed after the project phase of ACAP is complete. To avoid duplication of effort, ACAP will also establish liaisons with relevant standards developments elsewhere. In particular, the project is already in contact with EDItEUR with respect to its development of ONIX for Licensing Terms; and, in view of the significance of identification issues, with the International DOI Foundation.
7 Next steps
It is anticipated that the project will be launched publicly in September 2006; there is a great deal to be achieved between now and then, and at launch it will be possible to be much more explicit about plans and expectations. However, it is very important that the publishing community as a whole is ready and willing to respond positively when the project is launched.
The feasibility study commissioned by WAN, EPC and ENPA concluded that this project is technically feasible – and indeed requires little in the way of genuinely new technology. Rather, it requires the integration and implementation of identification and metadata technologies that are already well understood. It is also possible to chart a developmental path which does not demand that every element of the framework must be in place before any of it can be usefully implemented.
However, this is not to suggest that everything will be simple, not that it can be achieved without cost. A significant part of the project cost will have to be borne by those organisations that agree to participate in the pilot, in the development of their own systems; however, there will also be central costs, to which it is hoped that other publishers will be prepared to contribute.
If you have any questions about this project, or would simply like to express your support, please contact: info@the-acap.org
Posted by Danny Sullivan at 10:41 AM | Permalink
I've had a long talk with the group that so far has successfully sued Google in Belgium over indexing, a talk that leaves me thinking they don't fully understand how search engines work and why their arguments over copyright infringement will ultimately fail. Then again, the case is really about trying to convince Google it should pay to carry their news content. A closer look at all this in the story below, as well as an update on the situation in general, including an appeal for Google that's been granted.
Let's go back to the beginning. In March, Copiepresse tells me it started legal proceedings against Google over its inclusion of Belgian news sources without explicit permission. The organization represents a number of publishers that were concerned over being indexed.
Information about the case, including a summons, was all set to Google in the United States, according to Copiepresse. A hearing was held in Belgium on September 5th, then the ruling came out last Friday, September 15. Google didn't take part in the hearings, for reasons it says it is still investigating.
The ruling required that Google do two main things within 10 days of receipt:
Over this past weekend, Google says it complied with the first part. It removed links to at least these news sources, Google told me:
dhnet.be grenzecho.be lacapitale.be lalibre.be lameuse.be lanouvellegazette.be laprovince.be lecho.be lequotidiendenamur.be lesoir.be pressbanking.com votrejournal.be
It's been noted that Google did more than remove these sites from Google News Belgium. They were removed from Google Belgium entirely. Here are a couple of searches that demonstrate this:
site:dhnet.be site:grenzecho.be site:lacapitale.be site:lalibre.be site:lameuse.be site:lanouvellegazette.be site:laprovince.be site:lecho.be site:lequotidiendenamur.be site:lesoir.be site:pressbanking.com site:votrejournal.be
Some have thought this is an example of Google getting revenge, robbing these publishers of regular traffic they probably assumed was safe in a fight over Google News indexing. For its part, Google said its reading of the ruling meant that the sites had to be dropped entirely from Google Belgium:
Order the defendant to withdraw the articles, photographs and graphic representations of Belgian publishers of the French - and German-speaking daily press, represented by the plaintiff, from all their sites (Google News and "cache" Google or any other name within 10 days of the notification of the intervening order, under penalty of a daily fine of 1,000,000.- € per day of delay;
I've bolded the key part. Google says it interpreted "all their sites" as being all sites that it views the court having jurisdiction over, anything using the Google.be domain. In addition, Google has removed the sites from Google News worldwide, saying it is treating the ruling as it would any request to be removed from Google News. In those cases, you're dropped entirely, not on a country-by-country basis.
The sites do still appear in a searches via Google.com or other Google editions not aimed at Belgium. While these sites can still be reached from Belgium, Google considers them outside Belgian jurisdiction.
That view is sort of laughable, though I understand the reasoning well. It's unlikely that Google Belgium is actually being served up out of Belgium, so artificially pretending that Google.com another other Google sites are somehow "outside" Belgian jurisdiction makes no sense. However, this type of pretending isn't that unusual. It's a nice way for search engines to act like they are following the ruling of a particular country by making changes on "that country's Google." It's also a convenient way for particular courts to feel they've exerted jurisdiction over sites that that they might really not be able to control.
Overall, Google has complied with the first part of the ruling. As for the second, it hasn't posted the required notices and says it will wait for a ruling due out Friday specifically about that issue. It argued yesterday in a hearing for appeal that posting the notice on the home pages wasn't necessary given all the publicity the case has now received.
An appeal for the case overall was granted. It will be heard on November 24, and the entire matter is largely in limbo until then. I hesitate to consider the case a victory for Copiepresse given that the first hearing -- for whatever reason -- had no defense from Google at all.
This leads me to Copiepresse's complaint with Google. In the group's view, Google has illegally copied material without permission. It feels that in some way, Google should get permission before indexing.
Indexing, of course, is not copying. Search engines do read pages in to make them searchable, as my Indexing Versus Caching & How Google Print Doesn't Reprint article explains in more detail. But indexing isn't reprinting pages, in the way some arguments try to make it. Google does show cached copies, something raised in the case. But cached copies aren't shown within Google News search, which was the main focus of this case (as an aside, one US court has ruled cached copies aren't an infringement, something I disagree with but something also easily rectified through no caching mechanisms).
I had a very long conversation about the permissions issue with Margaret Boribon, secretary general of Copiepresse, to try and better understand how they wanted Google to operate. Why not use commonly understood and effective mechanisms such as robots.txt files or meta robots tags to prevent indexing?
"If you do so, you admit that Google does what they want, and if you don't agree, you have to contact them. This is not the legal framework of copyright," Boribon said.
This is an age old issue in the search engine world. By default, search engines assume that permission is granted to index a document, in order to make it searchable. Technically, shouldn't they get explicit permission? Legally, that might make things safer. Logistically, it would never work. Many sites don't have clear contact details. Some domains themselves contain multiple sites. Moreover, there are millions of sites across the web. Contacting them all beforehand simply wouldn't work well.
I asked Boribon about this, how her group would propose search engines undertake such a task.
"I'm sure they can find a very easy system to send an email or a document to alert the site and ask for permission or maybe a system of opt-in or opt-out," she said.
Would it be OK for such a system to work automatically, I asked? Yes, that would be fine. A machine-to-machine connection would be OK, she said. So then, I asked, why not use the existing robots.txt or meta robots systems?
Both mechanisms are easy, automatic ways for publishers to declare if they grant indexing permission or not. In fact, I'd argue that both are a way for search engines to ask beforehand for the very permission that Copiepresse wants them to seek. Major search engines -- not just Google -- all request or check these blocking mechanisms.
Boribon rejected the existing solutions. One issue she had was that they weren't legally endorsed. That's true, but that's also something I think will change over time. In the US, we've had one case recently where opt-out solutions like tags have been accepted.
Outside the US, there have been some scatted cases, such as this one from 1997 in the UK involving news indexing. But none of these cases have seemed to stop the search engines.
The Belgium case could be different. What happens in one country isn't applicable to others. It may be that Copiepresse will prove its point that permission should be sought in advance. Alternatively, a court could endorse existing blocking mechanisms as having legal force.
That's what I think should happen. These systems pose an easy way for anyone who doesn't want to be in a search engine to stay out. If the issue with Copiepresse was really about not being indexed, all of the publications it represents could easily stay out through those solutions. Google -- like other major search engines -- doesn't index sites against their wills.
There's more at work here, of course. The publications DO want to be in Google. The action is simply an effort to force Google to the bargaining table and get paid for inclusion, from what I can see.
"Our purpose is not to be excluded. Of course, we want to be in the system, but on a legal basis," said Boribon. "We want to be remunerated."
Her group's view -- as is the view of the World Association Of Newspapers that she also referenced several times -- is that Google is exploiting sites. It is making money off these sites and giving them little or nothing in return.
Most search marketers hearing this have to stifle laughter or disbelief. That's because most search marketers want all the search traffic they can get. It's free, easy and converts well. They understand that search engines give them plenty of value and complain most when something happens to take that traffic away, as was the case with the Google Florida Update o