April 10, 2008

SEO Spam is Good

TechCrunch recently had a post lamenting the fact the Barnes and Noble's new How-To site, Quamut, is being spammed by SEO guys looking for some free link juice. The B&N site wasn't adding nofollow to their external links, so it's been open-season for SEOs. (Before you get all excited, they've now changed the links to nofollow.)

To many people, that SEO spamming may look like a bad thing. I think it's the best thing that could ever happen to Quamut.

Unlike other types of spam, good link spam carries with it a wealth of benefits for the site being spammed: 1. It brings users. When a new social site debuts, especially when it is a "me-too" site like Quamut, getting users is tough. Unless you offer some special incentive, or your site provides something necessary that other sites don't, you have to fight a tough battle for users. If someone wants to add link spam to your site, they need to sign up. The thousands of SEO Spammers out there can quickly become thousands of new members of your site. And when the spammers sign up under multiple accounts, they can quickly become tens of thousands of new members. 2. It adds content. It might not be the best content ever written, but SEO spammers do know how to write content that, at the very least, is unique, keyword-rich and geared to any user that might stumble upon it. Contrary to popular belief, SEO spammers are not interested just in backlinks, but also in filling up the SERPs. If they can get a page on your site to rank by combining their content with strength of your site, and then convincing the user to shift to their site, the bottom line stays the same. 3. It raises stature. When your brand spanking new social network has 10,000 members and 50,000 UGC articles after only one month, your site starts to get noticed--even if most of those members are spammers and that content is primarily spam. There's a reason companies like MySpace and YouTube didn't crack down on spammers--and even explicitly allowed spam in their original Terms of Service. If you want to grow--and grow fast--no one will help as much as spammers.

SEO Spammers contributed to padding out Wikipedia; for every great article that was written to insert a spammy link, Wikipedia got a great article. They helped get YouTube to critical mass; for every YouTube embed done to get a YouTube backlink, YouTube got more video views. SEO Spammers keep MySpace growing. Do you still know anyone with a MySpace account? Can you tell me how their growth keeps skyrocketing? Check the inbox of your old MySpace account and you'll see how.

In short, SEO Spammers are helping the internet continue to grow. As each once-spammed site gets big off of the shoulders of spammers, they introduce methods to lock the spammers out, and the spurned SEOs move on to new sites. The cycle continues--and, with it, innovation in the social and user-generated content fields.

If popular sites are suffering under a flood of spam, I sympathize with their decision to add nofollow their links and put barriers to stop spammers--as long as they don't forget who made them popular to begin with.

Posted by Eli Feldblum at 10:22 AM | Permalink | Comments (0)

Microsoft to Fight Search Spam by Analyzing Email

Here's a story I missed when it broke. On March 25, Microsoft was awarded a patent it applied for nearly 4 years ago, to fight search spam based on external elements, like "electronic documents," or email. The prevailing theory is that similar indicators will show up in spammy emails and spammy blog comments and other SEO spam.

Given the resurgence of spam from SEO companies, Microsoft may also want to use the spam filters built into Outlook to highlight potential SEO spammers, working on the theory that spammers are spammers, in any and all fields. No question that this approach may be susceptible to some level of abuse, but given the amount of people using Office, it's unlikely that subscribing your competitor's newsletter and then tagging it as spam will really affect them. SeoByTheSea wants to take it even further, and suggests that the URLs in spam emails get tagged as SEO spam as well.

But before we get all that excited about what direction Microsoft (or MicroHoo) can take this innovation, we need to remember how poorly Microsoft has used this to deal with SEO spam in the past 4 years. A Google or MSN group with any keyword in the title will still rise, almost automatically, to the top of Live.com SERPs, regardless of its relevance. Let Microsoft fix that loophole first, and then go after email/SEO spam convergence.

Posted by Eli Feldblum at 7:59 AM | Permalink

March 20, 2007

Microsoft Researchers Show How Advertisers Are Funding Search Spam

In a paper entitled Spam-Double Funnel: Connecting Web Spammers with Advertisers , researchers at Microsoft and the University of California Davis show the path whereby the ads of legitimate web site owners come be shown on spam pages. The paper reported on Monday in a New York Times story Researchers Track Down a Plague of Fake Web Pages is to be delivered in May at the 16th International World Wide Web Conference in Banff, Alberta, Canada. The paper’s methodology, finding and conclusions are of interest to search marketers.

For this paper the researchers focused on redirection spam (for examples of redirection spam) where Web pages redirect browsers to visit known spam controlled domains. Many of these redirection spam pages use pay-per-click advertising and frequently display ads from reputable advertisers. Many research papers on search spam are essentially descriptive seeking to categorize the various forms of search spam. This paper provides means for identifying not just how these redirection schemes work but points to who is involved in the schemes.

To unravel these redirection schemes and identify the sources, the researchers simply “followed the money” analyzing the end-to-end redirection paths (for more on the methodology and how you can use similar tactics, see Strider Search Ranger). In the paper they outline the methodology they used to analyze tens of thousands of spam links found for this piece of research. To describe their findings they created a five-layer double funnel model that includes:

- Doorway pages - Redirection domains - Aggregators - Syndicators - Advertisers.

Spammers control the doorway pages and redirection domains, aggregators buy traffic from the spammers and sell traffic to the syndicators who in turn are paid by the advertisers for to display their ads. The system works both two ways.

For their study, the researchers used 1,000 keywords spread across ten spammer targeted categories – spammer targeted keywords in one set and most bid advertiser keywords were targeted in a second. Predictably, the categories included:

- Drugs - Adult - Gambling - Ringtones - Money - Accessories - Travel - Cars - Music - Furniture

The results of the analysis and the conclusions are of particular interest:

For Layer #1 – the doorway domains. The free blog hosting site blogspot.com was responsible for one in every four spam appearances in the top search results. At least three in every four unique blogspot URLs that appear in the the top 50 results were spam. (Aside – this is not new news to most search marketers, but it is nice to see real hard data on this.)

For Layer #2 – the redirection domains The spammer domain topsearch10.com figured prominently and 209.8.25.150~209.8.25.159 IP block where it resided hosted multiple domains responsible for 22-25% of all spam appearances.

Layer #3 – the aggregators which the authors believe present the best target for attacking search spam and are a bottleneck. Two IP blocks 66.230.128.0~66.230.191.255 are responsible for the 100,000 spam ads in the sample (Aside -- Talk about a bad neighborhood).

Layer #4 – the syndicators includes just a handful of ad syndicators who serve as middlemen for the majority of the spammers.

Layer #5 – the advertisers includes many well known reputable advertisers whose ads garner traffic funneled through the system. It is advertiser money that fuels the entire system.

The authors hope that their paper will help search engines strengthen their ranking algorithms and will provide impetus for advertisers to carefully scrutinize their involvement with syndicators and traffic affiliates.

Posted by Amanda Watlington at 1:06 AM | Permalink

November 24, 2006

Yahoo Image Search Bug Showed Sex Images For Innocent Search

Yesterday, on Thanksgiving, The Register reported that a search at Yahoo Images for franchise returned very offensive and disturbing images. I will not describe the images, but I saw them myself and as soon as I saw it, I emailed my contacts at Yahoo. Soon after the images were pulled from the search results. It seems to me that someone figured out a way to easily insert pornographic images into Yahoo images for a search term even with safe search on. The Register has blurred and censored screen captures of the first line of results.

Posted by Barry Schwartz at 8:47 AM | Permalink

November 22, 2006

Link Exchanges Are Spam Links According To Microsoft

The other day I reported that Microsoft Banning Sites from Live.com For Link Exchanges, where I uncovered an email sent to a Webmaster. The email stated that a particular site was removed from the Live.com Search index because the site was "acquiring links through posting to or exchanging links with sites unrelated to your site content." The email also added that these types of links are "spam links," and is the reason the site was delisted from the index.

It struck me that this is why Google and Yahoo remain very vague when telling Webmasters why their sites are deindexed or penalized. Simply, people may look at this email and figure that exchanges links with your friends is a bad thing. If you have a personal blog about your life and you wanted to link to your dad's dental practice web site, there is nothing wrong with that. But if you do run huge link exchanges, then you need to be worried. The email sent to this Webmaster might not be clear enough to explain the difference, and get other Webmasters worried.

Posted by Barry Schwartz at 9:28 AM | Permalink

Social Search Manipulation: Case Study

Niall Kennedy has one of the most thorough write-ups on why search spam exists with his article "The Spam Farms of the Social Web." The article explains how he stumbled upon a spam site, researched the site to death, guesstimates on how much money they can make and services that help you make it rank well. This includes a look at blogs, digg, del.icio.us, other social sites, link building tactics, directory inclusion, content writing, and more.

Posted by Barry Schwartz at 8:54 AM | Permalink

November 10, 2006

Hack Reveals How To Remove Sites From MSN Live Search?

Boogybonbon.com has revealed how you can potentially de-list your competitor's site from Microsoft's search engine. In short, most sites return a 200 status header for when you go to a page like domain.com/index.html?test=test or domain.com/index.html?test=test1234, etc. You can play on that by convincing Microsoft that a particular site has hundreds or thousands of duplicate pages, and at some point, Microsoft may penalize the site with a duplicate content penalty, where they de-list your site and home page. That is the short story, if you want the long write up visit Boogybonbon.com.

Postscript: Other coverage at Threadwatch and Search Engine Watch Forums.

Posted by Barry Schwartz at 9:32 AM | Permalink

November 1, 2006

Yahoo's Tim Converse Colors SEOs

Tim Converse, the "spam fighter" at Yahoo, has a fun post he named Search engine optimization (SEO) from black to white. He tries to add nine colors between black and white. For example, a "dark gray" SEO is an SEO that "collects (aka steals) random text from other sites, and uses it to create thousands (or millions) of pages targeting particular queries. The pages have nothing original of value, but do have ads." The new shades of black and white include; Dark inky black, Charcoal, Dark gray, Slate gray, Gray, Light gray, Off-white, White, and Luminescent pearly white.

Posted by Barry Schwartz at 9:12 AM | Permalink

October 19, 2006

United Press International Selling PageRank

Threadwatch reports that the United Press International is selling links based on PageRank values. If you visit the advertising section, specifically for text links you will clearly see UPI marketing those text links to manipulate rankings and not for direct traffic building purposes like this:

The benefits they list include "increasing Page Rank" and "improving search engine results." Plus UPI listed out pages, with their current PageRank values and backlink counts for marketing reasons.

I have posted a screen capture at Flickr to document the full page it before Google removes all value from the links on this site. :-)

Posted by Barry Schwartz at 9:19 AM | Permalink

September 27, 2006

John Battelle Talks With Matt Cutts & Nofollow Attribute The Same As Meta Robots Nofollow?

John Battelle has a short interview with Google spam fighter Matt Cutts. The most interesting part I found was news that the W3C has added a meta nofollow tag to their page with paid links, which Matt seems to say is the same as the completely different nofollow attribute and thus something acceptable for to do by those selling links who fear the wrath of Google.

Let's back up. You can put a meta robots tag on your pages with the value of "nofollow," as described here. This tag, about 10 years old now, long predates any concerns about link selling skewing search results or the nofollow attribute. It is supposed to tell a search engine not to follow any links on a page, for purposes of indexing those links.

In other words, you've got a page with 20 links leading to other pages in your web site. Put nofollow into a meta robots tag, and you're telling the search engine not to follow the links on that page to those other pages.

An important note. Just using nofollow doesn't protect those other pages from being indexed. If there's any other links pointing at them from anywhere on the web, search engines will follow through to them that way. So if you don't want them indexed, you need to make use of a meta noindex tag or robots.txt text to specifically block them.

Now on to the nofollow attribute. Created in January 2005, it was a way to flag particular links to search engines as those a site owner doesn't explicitly approve of. It was never defined as a means to telling search engines not to actually "follow" the link. It was more a way to say that you don't endorse the link. In fact, to my knowledge, Yahoo and perhaps others will still "click on" or follow links even if they make use of the nofollow attribute.

Now to the W3C. W3C Selling PageRank Or Thanking Supporters? covers how some have felt they've effectively been selling links without using the nofollow attribute that Matt Cutts in particular has urged those selling links to do, lest they potentially be penalized by Google.

In Matt's interview, we read that using nofollow in the meta robots tag might be seen as the same thing as a nofollow attribute, at least in Google's eyes. That's a completely new thing to me. I've commented on Matt's blog post about the interview, to see if he'll clarify more.

Aside from nofollow, the interview also gets into some interesting discussion of whether Google should do more to use humans in refining results.

Posted by Danny Sullivan at 7:42 AM | Permalink

August 25, 2006

Tim Converse OF Yahoo Talks About Aggregation Spam

Tim Converse, the web spam fighter at Yahoo Search, wrote a very interesting blog entry explaining aggregation spam. In short, aggregation spam is a form of content spam where you scour the web for matches on a specific keyword phrase, then compile a page of content with snippets and chunks of content found containing that keyword phrase and related keywords around it.

Tim offers up this extreme analogy;

Imagine that you get home one night to find a stranger leaving your house with a sack containing your TV, cell phone, jewelry. You might misunderstand, until we explain that he's actually an aggregator - he's just aggregating your belongings.

Tim explains that it is hard for the search engines to draw a fine line in the sand as to what is defined as high-quality aggregation that should be included in the search engines versus those that should not be included. But one thing he personally believes is that the "the bar for inclusion ought to be pretty high."

Read Tim's personal thoughts on aggregation and search at his blog.

Posted by Barry Schwartz at 8:08 AM | Permalink

August 22, 2006

How XSS HTML Injection Might Let Others Put Links On Your Sites

SEOMoz has some excellent examples of government sites that are susceptible to cross site (XSS) html injection, something that can also happen to any site. Let me first do my best to explain what this means in layman terms (hope I get it right).

In the examples shown at SEOMoz, they were able to add the link that looks like "<h1><a href="http://www.example.com">Look, I made a link</a></h1>" in the HTML to a new page hosted on a .gov site. Now, the page is a brand new, dynamically generated page, because the HTML itself is injected via the URL, which may look something like;

textQuery=%3Ch1%3E%3Ca+href%3D%22 http%3A%2F%2Fwww.example.com%22%3E Look%2C+I+made+a+link %3C%2Fa%3E%3C/h1%3E

The examples are still live, here is one of twenty, epa.gov link.

Now, if the search engines index this page - and they will, if there are enough links pointing to this new page, the search engines may assign higher weight to the links on this page, since it is a .gov link and thus benefit the injected links.

This exploit was first made public in mid-June. This is something that can happen to almost any site or any server. Google itself is not immune to this exploit, they suffered from it in early July. And I also had an exploit on one of the tools at rustybrick.com that people began exploiting.

I personally commend SEOMoz for posting the details on the 20 governmental sites with this exploit. They should ensure that their sites do not have this vulnerability and someone pointing this out, will help (encourage) them do something about it.

Posted by Barry Schwartz at 8:21 AM | Permalink

July 7, 2006

Business.com Adds Nofollow To Many Links

Threadwatch reports that Business.com has added the nofollow attribute, a method of telling search engines not to count particular links as a "vote," to many of its outbound links. Aaron Wall discusses how the use of the nofollow in this sense "muddies their credibility" by saying they have links in their directory that they don't trust. But it appears that only those that pay Business.com for a directory listing get a link without the nofollow added to it. Everyone else who is accepted into the directory, is tagged as untrusted. That's the exact opposite of how Google's Matt Cutts has said he thinks nofollow should work.

Postscript: Business.com - Use of "No Follow" Tags Explained has Business.com explaining why it uses nofollow in some cases and not in others. Postscript 2: Business.com's "No Follow" Policy Revision has Business.com changing how it uses nofollow.

Posted by Barry Schwartz at 8:41 AM | Permalink

July 3, 2006

BBC News Features Article On Google Search Spam

A BBC News front-page article named Google to stay focused on search brings the issues of search spam to the public. The article explains how seventy-percent of Google's focus in on Web search and then goes into several paragraphs on how search spam is a huge issue. The article quotes Douglas Merrill, of Google engineering, saying, "Spam is an arms race," explaining that "spammers are highly motivated. There is a lot of money at stake."

Posted by Barry Schwartz at 9:38 AM | Permalink

June 26, 2006

MSN Talks Spam Defenses; Takes Weekends Off From Indexing

This morning I uncovered two threads at WebmasterWorld that provide information on MSN from spam defense to when search indexes get updated. The first is named MSN Asks Webmasters, What is Spam? where MSNdude provides some insights into how MSN determines what is spam, what are junk pages and determining the "hierarchy of spam." The second is named MSN Won't Do a Search Index Update on Fridays, Saturdays or Sundays where we see MSNdude posting that normally MSN will not conduct a search index update on Saturdays and Sundays, and also they are unlikely to conduct an update on Fridays, because it may affect their weekends.

Posted by Barry Schwartz at 9:43 AM | Permalink

June 22, 2006

When's Matt Cutts Back From Vacation Countdown Clock

Thomas Bindl does what I was hoping someone would do -- make a countdown clock for when Google's Matt Cutts is returning from his vacation, spotted via Threadwatch. I've seen a number of posts in various places suggesting that Google has been having its recent spam and indexing problems because Matt's finally taken a nice, long break. Bull. Matt's great, a huge resource to Google, but the problems going on seem far more fundamental than Matt being away. If they really are due to him being gone, then Google has even bigger issues to deal with. Still, plenty of us will be happy to see him return and jump back into the search conversation.

Posted by Danny Sullivan at 10:46 AM | Permalink

June 20, 2006

Google Sub Sub Domain Issues Clearly Visible

Threadwatch reveals some more examples of issues Google is having. They note a search on queer forum returns CraigsList 97 times out of the top 100 results. That is not all, a search on wedding forum returns about 50 of 100 results from CraigsList's site, just scroll down to number 50 and you will see.

Is CraigsList spamming? No! Is Google suffering? :) Google is clearly having issues with sub sub domains. Continued coverage of Google's public index issues.

Postscript From Danny: Comments at Threadwatch also note Yahoo has the same issue. MSN does not as badly (but that could be the result of spidering fewer pages) and Ask looks very good.

Posted by Barry Schwartz at 8:19 AM | Permalink

June 19, 2006

Google Yanks Sites 5 Billion Pages After Spam Complaint

I covered a DigitalPoint thread which uncovered several domains that was able to rank billions of pages at the top of the Google results within a couple of weeks. The methods deployed to rank the pages seemed to include excessive use of subdomains, cloaking, content theft scraping, alexa traffic boosting and blog comment spam. I listed the documented steps here. Some suspect that Google's new URL handling with the big daddy update allowed "old school" cloaking to begin working again.

A Threadwatch post shows screen captures of the spam and also has a comment from Google representative, Adam Lasnik. Adam directly responds to over 5 billion pages of this domain being indexed, saying:

We have noticed that some site: queries are showing bizarre results and it's turned out to be tied to a bad data push. We're fixing it now.

Yes, we are aware of the site command issues (Google's mentioned them itself). That may mean it is far less than 5 billion pages indexed in this case -- but still, plenty of pages got through.

If the site command is the issue or even if it is not, this is still indicative of other substantial problems plaguing Google that are making the rounds on discussion board and blogs lately.

Posted by Barry Schwartz at 9:09 AM | Permalink

June 14, 2006

New Public Link Spam Exploits

Peter Da Vanzo has posted information on XSS Redirects & SEO. Peter linked to two documented methods of exploiting comments and links at blogs and other sites. The two links include; XSS and Redirection Attacks, which makes for a nice and interested educational read and Moveable Type Backlink Exploit that makes me a little depressed (running MovableType and all). Point being? The nofollow attribute, created to slow down link spam, has not worked, IMO, I actually had to pull comments and trackbacks completely from my blog after 3 years of them being enabled. Sad.

Posted by Barry Schwartz at 8:53 AM | Permalink

June 12, 2006

How Google Is Killing The Internet

Seth Jayson has written an interesting piece "How Google is killing the internet" over at The Motley Fool. It's a lengthy analysis which takes in part its premise that web authors are so desperate to get visitors to click on their Adsense links that they're creating pages of junk without any useful content. As a result the content that is returned as the result of a search (not just on Google but on its competitors websites as well) is valueless. I'm rather ambivalent about this but the implications for search are interesting to say the least.

In common with Jayson I've run searches that return very little useful content, or almost as irritatingly, have visited a page with good data, but that which has been spread over 4 or 5 pages to maximise the number of adverts I have to look at. Despite SEO claims that the best way to get a good ranking in Google is to have really good content, some pages that rank highly in the results have got there due to dubious methods such as cloaking or link farms. The argument runs that although Google should stamp down on activities such as this there is little incentive for them to do so because Adsense brings in so much of their revenue.

Well, yes and no. Obviously Google wants to make money, but equally the only way that they will achieve this is if people continue to use their resources. If the average searcher becomes disenchanted with Google, they do have other options available to them, with Yahoo, Microsoft and Ask already trying to get rather more than a foot into the door. Although Google is constantly releasing new utilities in order to get people to use their entire raft of products, their key focus is, according to Marissa Mayer still all about search.

As a searcher what I (and everyone else) wants from a search is an authoritative answer from a trustworthy source. What any search engine needs to do is give me a good reason to visit any particular website that is returned in the results. While I trust Google to do that, the key is not to trust it too much. If the searcher can retain a skeptical viewpoint with respect to the information that is returned to them they're not going to go too far wrong. Searchers need a blended approach, combining robot powered solutions but also resources created by human beings; indexes, virtual libraries, gateways and swickis for example.

So I don't think that Google is killing the internet; that really is a statement too far. If Jayson is correct and that search results are getting clogged up at Google, it is going to have the opposite effect - the more that people are disatisfied with the results they get, the more likely they are to explore other alternative methods of getting the information that they need. Indeed, as Jayson does actually point out towards the end of his article, other search engines are constantly striving to surpass Google and there are plenty of examples where this is already happening. The limitations of Google as reflected in poor results gives greater scope for other search engines and other search solutions, which has to be a healthy situation.

Posted by Phil Bradley at 9:23 AM | Permalink

June 8, 2006

Unique Content VS. Plagiarism In The Eyes Of An Algorithm

Chris Boggs over at the Search Engine Roundtable wrote an item named Which Came First: the Content or the Plagiarism? which discusses the challenge search engines face when it comes to determining the original source of a particular piece of content.

For example, the content I am writing right now may be picked up within a matter of seconds by another site that wants to "borrow" or steal the content. So now we have two (probably a lot more than two) sources with identical content. A search engine can say, hey, I found source A before I found source B with this particular content, so source A must be the original source. But if you think about that, since spiders don't work in real time, a search engine may visit the source that "borrowed" the content prior to visiting the original source of that content.

Chris offers two suggestions. The first is to watch your crawl cycles in Google and wait just before to post the content. Now that is not really feasible, as Chris knows, because there is no way to exactly know when Google will crawl your site and news information must be posted as soon as possible, so waiting is normally not an option. Chris uses this example to make a point, I believe. The other option Chris suggests is to use Google Sitemaps, so Google can see you as a trusted source and be feed the information, sooner than later.

But what do you think is the algorithmic solution? I personally do not know. There are people discussing the fundamental challenge at Search Engine Roundtable Forums.

Posted by Barry Schwartz at 9:29 AM | Permalink

June 6, 2006

Googlebowling A Reality?

Googlebowling is a term used to describe the method of knocking out a page from the Google search results. Googlebowling is conducted by linking to a particular site from sites within bad neighborhoods. Rand over at SEOMoz.org posted recent information he learned about Googlebowling while at SES London a week ago.

To successfully deploy Googlebowling, Rand writes that you need to "use patterns that would show that the site has 'participated' in [a spammy linking] program."

Specifically, this means you would point spammy links at the places the site you are targeting links to. If this is implemented properly and the site you are targeting is not a super authority, the site may be penalized for a long time. Note that the advice here is given not to encourage Googlebowling but to help people understand how it might be possible to impact their own sites.

Rand continues to explain that if a site is Googlebowled, you most likely will want to start fresh and drop the site that was penalized completely. I have discussed Googlebowling a few times at the Search Engine Roundtable. Two entries I would like to point out are:

+ Google Bowling For Dollars by Chris Boggs + Google Bowling Supporters Thread by myself

So can other people hurt your rankings? Can other links hurt you? Some think they can, but some such at Google itself say they cannot.

Posted by Barry Schwartz at 10:15 AM | Permalink

June 2, 2006

Reputation Management: How To Handle Saboteurs

The [failure] GoogleBomb had become well-known enough to have seen Marrissa Mayer post a response on the Google company blog last September. I first heard the phrase "Reputation Management" as applied to search from Heather Lloyd-Martin during a private conversation a long time before this. It was obvious Heather was on to something because we've all seen search results that produce unexpected listings. David Dalka recently posted his frustration that Googling his name could confuse searchers into thinking he is a millionaire. This may be a personal example, but what if you have a bona-fide saboteur?

Heather recently related to me her experience with a client where a saboteur took the client company name, mixed it with adult content, and auto-generated unsavory posts published across the Web in numerous blogs and forums. Needless to say, search results for that company started looking really bad, and at times, the whole set of results was flooded with what looked like adult listings.

Heather now regularly points out examples of big brands that could use reputation management as regards their search listings. She presents screen shots at conferences showing Google queries for uhaul and victorias secret having results at number 3 and number 2 respectively that read: "UHaul made my move a miserable and stressful experience" and "Victoria's Dirty Secret."

The dirty secret site has an image with an "angel" holding a chain saw. The site makes it sound as if whole forests are regularly depleted because the cataloger lacks environmental awareness. What can you do when this happens?

You certainly have little control over the natural rankings of saboteurs unless they spam. You can easily choose to hand spammers that polute your rankings over to search engine quality assurance teams when they use tactics that would have them removed. In the case of the dirty secret site, it appears the other extreme is occurring. The campaign for environmental change at Victorias Secret may be working. Perhaps Victorias Secret will establish more earth friendly contracts with their suppliers.

Other things you can do is publish pages telling your side of the story in the hopes to get natural rankings that counteract the negative spin. You needn't wait for natural rankings to appear either, you can purchase sponsored listings to drive users to the new pages straight away. At least in the meantime your presence can be felt on those most troubling queries should they begin to affect your image in search results.

Postscript: David's personal example caused him some grief. Consider the amount of grief an "eBay Avenger" causes the young fellow who it looks like fell victim to an angry buyer that decided to make an example of him. Even if the allegations later prove to be false, and although the eBay avenger has publicly offered to take down the site, SERPs for his name will likely be damaged for a long time to come, (Google, Microsoft and Ask too).

Posted by Detlev Johnson at 4:44 AM | Permalink

May 30, 2006

Google Sitemaps: Links To You Can't Hurt You

The Google Sitemaps team posted to their blog in response to a question at SearchEngineWatch Seattle. Interestingly, they note that links from bad neighborhoods do not harm a site's rankings, only links to bad neighborhoods. It has long been theorized that links from bad neighborhoods do cause ranking problems and this goes against conventional thinking.

Link networks often populate quality content sites with paid text links as part of their program. If at all possible, Google obviously wouldn't want to remove quality content from their search engine. One solution is to make outbound links from quality sites that sell links worth nothing towards building rankings for destination sites.

We've heard this from Matt Cutts before: "Link-selling sites can lose their ability to give reputation (e.g. PageRank and anchortext)." If a link from such a site loses it's ability to transfer PageRank, it can make sense that it doesn't harm a site's PageRank either. But that is not a foregone conclusion. The information comes from the Sitemaps team, and not Matt Cutts' anti-spam force.

In the above entry by Matt, he recommends the use of the "nofollow" link attribute to safely purchase links purely for traffic purposes. This infers links from bad neighborhoods indeed can harm a site's rankings in Google. Perhaps Matt implies this to deter link buying, but the advice is good insofar as links from bad neighborhoods also raises the profile of sites that eventually would come under scrutiny by Google. It can also be assumed that text links from bad neighborhoods can harm a site's rankings in other major search engines than Google.

Posted by Detlev Johnson at 8:22 AM | Permalink

May 25, 2006

Duplicate Content Detection Tool

I reported this morning about a new tool that checks your site to see how much duplicate content like content you have throughout your site. As many of you know, duplicate content is a major issue for many SEOs today. This tool will hopefully give you the ability to catch any duplicate content issues before they become a serious issue. The tool is named Site Wide Duplicate Content Analyzer.

Posted by Barry Schwartz at 9:18 AM | Permalink

May 24, 2006

Search Spam Detection Tool: How White Hat Is Your Web Site?

Nathan Weinberg spots a tool named Search engine spam detector. The tool looks at a particular URL and classifies what elements on the page may raise a spam flag at a search engine. So let us test it out on the SEW Blog, shall we? :)

According to the tool, this site is completely spam free. The tool hasn't found any invisible text, the tool has not detected any unnatural text, the tool hasn't found any significant keyword stuffing tendency in HTML code, and the tool hasn't found any doorway farm.

So the SEW blog is more white hat when compared to SEroundtable.com, I got invisible text.

Posted by Barry Schwartz at 8:49 AM | Permalink

May 10, 2006

Google Ban Checker Tool

This morning, I reported on a tool that allows you to check if you are banned in Google. The tool is a desktop application that searches Google using a site: command and also checks sites that link to you, to see if they are banned as well. You can check out the tool by clicking here. Keep in mind, Google also can notify you of some site penalties with Google Sitemaps.

Posted by Barry Schwartz at 9:38 AM | Permalink

May 8, 2006

Expedia-Hosted Domains Spamming

SEO Black Hat reports that it appears Expedia France is spamming the search engines. What this appear to be are hosted spam pages on the expedia.fr domain name. If you do a search at Google for buy viagra you will currently notice that buyviagra.blog.expedia.fr is the 2nd result. There are many other examples of these pages, in fact, my blog has been denying comment spam from all sorts of Expedia France subdomains including homeequitylineofcredit.blog.expedia.fr. This may just be some sort of Expedia hack, where spammers buy the subdomain from Expedia, to do what they want with it.

Posted by Barry Schwartz at 10:01 AM | Permalink

May 4, 2006

BMW Sort Of Hits Back At Google In Ad

BMW Ad With Google over at Google Blogoscoped gives me a chuckle for a variety of reasons. It shows an ad from BMW saying "The Search For Yourself Doesn't Run On Google." The irony! It comes after BMW was banned by Google briefly plus after Pontiac tapped into the Google brand to sell cars. Specifically:

  • Welcome Back To Google, BMW -- Missed You These Past Three Days covers how Google banned BMW Germany for spamming, knocking it out of the index for a few days. I'm sure the ad is just a coincidence, but it's sort of funny to see a pseudo-slam against Google following on this.  

  • TV Commercial "Googles" Pontiac covers how Pontiac embraced the Google name, with Google's permission, to help push its cars by tapping into the Google brand positively. Now here's BMW using the Google brand I'd say negatively to push its motorcycles. And you've got to wonder if they got (or needed) permission to use the Google name (it's only the name used, not the trademarked logo).

Posted by Danny Sullivan at 7:12 AM | Permalink

April 26, 2006

Google Sitemaps Adds Spam Checking, New Webmaster Help Center & Other Features

I just came out of the Meet the Crawlers session, where Google announced new features and a new layout for Google Sitemaps. The Sitemaps blog just posted the details as well. One huge feature is that Google tells you if your site is in the index or not and if it is not, they won't tell you why.

Here is a break down of the new features:

+ New verification method + Indexing snapshot + Notification of violations of the webmaster guidelines + Reinclusion request form + Spam report + New webmaster help center + More about our new look + Adding a Sitemap + Navigating the tabs

Full feature list at sitemaps blog.

Postscript: Matt Cutts just pinged me to let me know he has posted an entry named Notifying webmasters of penalties. That entry explains that the Google Web Search Team and Google Sitemap Team working together to notify "some (but not all)" webmasters of Google site penalties.

Posted by Barry Schwartz at 1:59 PM | Permalink

April 20, 2006

Conference Coverage: PubCon 2006 & SES Japan 2006

PubCon has been happening out in Boston, while Search Engine Strategies is going in Japan. Here's a round-up to some coverage on search-related sessions:

Want to comment or discuss? Visit our SEW Forums thread, PubCon Boston 2006.

Posted by Danny Sullivan at 8:35 AM | Permalink

April 13, 2006

Traffic-Power Case Against SEO Book Dismissed

A bit of catch-up, Aaron Wall of SEO Book notes that the case against him filed by Traffic-Power.com was tossed out of court on jurisdiction issues. Traffic-Power has 30 days to appeal, but Aaron's hopeful this means the case is over. The case against Traffic Power Sucks has yet to be resolved, he also notes. For background on the Traffic-Power suits against both TrafficPowerSucks and SEO Book's Aaron Wall, see these past posts:

Postscript: Actually, Aaron writes to clarify the appeal time has already passed. The case was tossed out on February 13, so the 30 period for appeal has elapsed.

Posted by Danny Sullivan at 8:44 AM | Permalink

April 11, 2006

Matt Cutts Requests Spammers To Document Search Spam Techniques

Matt Cutts of Google posted a funny entry where he notes that he will be on the program committee for AIRWeb, Adversarial Information Retrieval on the Web, this year. He sarcastically asks search spammers to submit their tricks and ideas on how to spam search engines. If you really want to submit your techniques, the call for papers can be found here.

Posted by Barry Schwartz at 8:50 AM | Permalink

April 7, 2006

MSN, Yahoo Seek SEO People To Help Them Rank Better

There's still the occasional person who I encounter who thinks that SEO overall is somehow wrong to do or something the search engines frown upon. Yahoo!, MSN & Ebay recruiting - SEO hits the big time is an example of why this isn't so. It covers how Yahoo, MSN and eBay in the UK are all recruiting internal SEO people to help promote their own sites.

Such hirings aren't new. We've long had search companies themselves trying to rank well in other search engines, to the point of hiring people internally or externally to make it happen. But it's a nice reminder for everyone to keep in mind.

Personally, I got a chuckle out of the breakdown Threadwatch did of the MSN UK recruitment ad. Wanted: Spammer-in-chief for MSN over there highlights some of these key success metrics for MSN UK's SEO person:

  • Achieve x% of traffic from Search engines within our top channels and homepage
  • Achieve x% of pages cached/listed within Google and Yahoo
  • Achieve first page ranking in Google/Yahoo for major channel entry points and other important MSN content areas
  • Beat Yahoo! on listing results in Google on major events
  • Demonstrate clear traffic improvement as a result of implementation of SEO techniques
  • Ensure all new pages are SEO compliant

As for Yahoo, I found these points interesting:

  • All title / meta data tagging of UK products
  • URL specifications of UK products
  • URL redirect mappings
  • Producing SEO recommendations and documentation for UK products
  • Fostering development of effective SEO tools / reporting process
  • Maintaining existing Yahoo! UK products search engine rankings
  • Identifying and flagging potential spam violations on UK products
  • Identifying and flagging inappropriate Y! content that is accessible via search
  • Weekly / monthly SEO reporting & statistics for UK products
  • Keyword / log analysis reports for UK products
  • Keeping web development, engineering and production up-to-date with developments in the SEO field

Note the part I bolded. Nice to see that Yahoo UK wants to ensure no one suddenly accuses it of spamming itself or another search engine. Nah, such things never happen. Wait a minute: Google Admits To Cloaking; Bans Itself. That was from last year, but to be fair, it was pretty much an accidental thing.

Posted by Danny Sullivan at 9:30 AM | Permalink

April 5, 2006

MSN Web Spam Patents Applications & Algorithms Explored

Bill Slawski has an excellent write up on web spam through the eyes of patent applications and published papers. During Bill's research, he found PageTurner by Microsoft, which not only looks at how to establish a crawl frequency of specific Web pages, but also identifies "duplicate and near duplicate content on web pages." From one of the papers Bill referenced in the post, he notes the usage of the words "crafty porn." That leads him to a patent application we referenced last week named content evaluation by Microsoft. Anyway, Bill really digs deep into these algorithms and patent applications with links and abstracts pulled of content and video presentations. Read the full blog entry entitled Fighting web spam with algorithms.

Posted by Barry Schwartz at 9:09 AM | Permalink

March 7, 2006

Writing for Search Engines by Copying and Pasting

ClickZ links to a great undercover project by the Wall Street Journal named Our Columnist Creates Web 'Original Content' But Is in for a Surprise. The article is written by a columnist that went under cover, and was hired by Web "publishers" that want so-called "original content" for ranking well in search engines. The writer explains how he was hired to write 50 articles, each 500 words long for a total sum of $100. In the end, the "publisher" wanted plagiarized copy for his 50 articles.

To make a long story short, he spent days researching and writing one article, sent it to the client, who said it was written well, but wanted to break up his original article into smaller more keyword phrase specific articles. The client sent an example of an article to the WSJ columnist, who noticed that it was plagiarized not from one site, but several well-respect sites, including World Health Organization, New Scientist and WebMD sites. The client wanted the columnist to flip around from site to site and copy pieces of content from popular sites, and paste them together to make "original content."

All in all, he blames the search engines for allowing this. He provides the following analogy; "In fact, search engines are more like a TV camera crew let loose in the middle of a crowd of rowdy fans after a game. Seeing the camera, everyone acts boorishly and jostles to get in front. The act of observing something changes it."

To be fair, I just wrote an article this morning at the Search Engine Roundtable named Writing Articles That Get Links. I explain in that article that the copy-writing for search engines is getting old and will eventually be figured out by the search engines. For articles, today, to get links, to rank well, they must be written with soul and emotion. You have to care about what you are writing for people to want to link to them. You see these patterns happening at Google today. Of course there are hundreds of examples of pages that catch the long-tail of search terms that are plagiarized - but you will notice (1) less and less of this in the future (2) and/or longer-tailed keywords being targeted with these articles. Both of which reduce the likelihood of a searcher locating such articles.

Posted by Barry Schwartz at 9:39 AM | Permalink

March 6, 2006

Hosted Doorway Pages & Paid Links Back At Stanford University

Bouncing ball time. Last April, I wrote about how the Stanford Daily newspaper was selling links for those seeking to rank better on Google, ironic given that Google was born out of Stanford University and is very anti-link selling. Then last May, the newspaper decided to abandon paid links along with doorway pages it hosted for third parties. Today, SEW Forums moderator AussieWebmaster notices that paid links and hosted web pages have come back, such as you'll see at the bottom of the paper's home page here and a hosted page here.

Nope, I don't see the use of nofollow as Google's Matt Cutts recommends, nor is the page banned by robots.txt from being indexed. Far from it, it's ranking well.

AussieWebmaster -- Frank Watson -- oversees a site for currency trading terms, which is why the Stanford-hosted page came to his attention. It currently ranks fifth out of about 40 million pages that Google has indexed for the term forex.

Well heck, at least the page carries Google's own AdSense ads on it :)

Want to comment or discuss? Visit our Search Engine Watch Forums thread, Paid Links, Hosted Doorway Pages Back At Stanford Daily.

Posted by Danny Sullivan at 2:34 PM | Permalink

February 21, 2006

Getting Reincluded In Yahoo Via Paid Inclusion Tough? Try The Independent Reinclusion Forms

Rand at SEOmoz shows his frustration with Yahoo paid reinclusion. He tells a story of a client who hired him to clean up his site after being banned from Yahoo. Rand's team did just that and after using paid Sitematch program for reinclusion, the site was denied. Rand posts his conversation with a Yahoo representative, where he shows that even though the site is cleaned up, Yahoo has it on a list that doesn't allow it to be reincluded. I tend to see these posts and complaints arise weekly in various SEO forums, so this is far from a one case situation.

They won't tell Rand if there are issues with the current site, all they can say is that the site does not "meet our [Yahoo] quality guidelines requirements." For the full effect of the "absurdity" in the conversation, read the entry.

So what does one do if they are in Rand's position? First try following the reinclusion tips Danny worked up back in June of 2005. If that fails, I have reported that a fairly unknown Yahoo Second Review Request form works wonders in getting sites reincluded into Yahoo.

Posted by Barry Schwartz at 11:08 AM | Permalink

February 13, 2006

Google Officially Confirms Traffic-Power Ban From Index

Google's Matt Cutts has provided official confirmation of a ban on the Traffic-Power domain name and some Traffic-Power client sites. Matt writes about how Google hasn't usually confirmed or denied if a company has been banned in the past, but it's a policy now changing in cases where Google finds it useful to help educate site owners and others. As for Traffic-Power, Matt wrote:

I can confirm that Google has removed traffic-power.com and domains promoted by Traffic Power from our index because of search engine optimization techniques that violated our webmaster guidelines at http://www.google.com/webmasters/guidelines.html.

Matt's post -- which he notes was reviewed by Google's lawyers -- was in reaction to a recent court filing in the case of Traffic-Power versus TrafficPowerSucks. As Threadwatch notes, the filing by Traffic-Power alleges that TrafficPowerSucks has made false and defamatory claims including:

a. Claims that the search engine giant Google has banned and is banning from its search engine listings websites of Traffic-Power.com clients because of the search engine optimization strategies used by Plaintiff.

b. Claims that clients of Traffic-Power.com run the risk of being banned from Google search engine listings if they use Traffic-Power.com services

Fair to say, TrafficPowerSucks now has some pretty powerful evidence to refute the Traffic-Power allegations.

For background on the Traffic-Power suits against both TrafficPowerSucks and SEO Book's Aaron Wall, see these past posts:

Want to comment or discuss? Visit our SEW Forums thread, Traffic Power Files Suit Against SEO Book.

Posted by Barry Schwartz at 9:11 AM | Permalink

February 8, 2006

Welcome Back To Google, BMW -- Missed You These Past Three Days

I said BMW would be back soon after they got banned on Saturday. Matt Cutts over at Google lets everyone know they are now back in. So, they got a three day slap on the wrist. It demonstrates once again how public spam reports can be so effective and how big major web sites really don't get the "death penalty," when it comes to spamming.

Spam always seems to get removed faster after a big dose of publicity. Back in 2003, I wrote Google Kills eBay Affiliate Spam Quickly, Others Survive for Search Engine Watch members that looked at how an eBay affiliate using doorway pages was quickly removed by Google after public exposure. In contrast, people still complain that nothing happens when they file spam reports with major search engines through official spam reporting feedback forms.

BMW's situation proves once again that the best spam antibiotic is a good topical application of publicity. So did you spot spam? Blog away. Get others to blog, and that will probably help get the spam removed.

Are you spamming? If you're not hiding your tracks well, be forewarned that the publicity monster might roll over you at some point. On the flipside, we'll eventually have so many public spam reports that not all of them will be dealt with.

For example, More European Automaker Sites Do Doorways & Should Search Engines Be Able To Enforce Spam Rules? on the blog from yesterday covered spamming spotted by Porsche Denmark and Chevrolet Sweden, but those two automakers remain listed. I expect they probably will remain listed, too. If BMW took a ding for being banned, Google took some hits from those who feel spam removals ought to happen after a warning. Google's probably thinking about ramping up the spam notification program it was testing before wiping out any more big time sites that might push back on no warning wipeouts.

Meanwhile, a second spam truism gets proven. Big companies hardly face a "death penalty" on Google. They get back in and fast. Let's do some timings. In the Spam Olympics event of getting back in after being banned, we have....

  • WhenU: Banned in 2004, back in after 42 days  
  • WordPress: Banned in 2005, back in after 2 days or less  
  • BMW: Banned in 2006, back in after 3 days

What if you aren't a big company? Matt covered the timeline on getting back into Google in his prior Filing a reinclusion request post.

How long do you have to wait now? That depends on when Google reviews the request and on the type of spam penalty you have. In the days of monthly index updates it could take 6-8 weeks for a site to be reincluded after a site was approved, and the severest spam penalties can take that long to clear out after an approval. For less severe stuff like hidden text, it may only take 2-3 weeks, depending on when someone looks at the request and if the request is approved.

So while BMW was upset that Google didn't give them a heads-up about being banned, at least they didn't have to wait 2-3 weeks to get back in. Over at Matt's blog post, you can see some of people commenting who aren't happy with such express service. Matt responds:

Our main goal has to be to give the most relevant results to our users; there is currently a trade-off between taking action to remove spam from our index vs. removing sites that lots of users look for with navigational queries.

That brings me back to the advice I've long given to those thinking of skirting search engine guidelines. How big do you think you are? If you really think you're running a crucial site, you can sin against Google and gang and probably be forgiven in short order. They do need you. Absolution will be provided. Maybe put you back in so that you don't rank well for generic searches, but you'll be back in and find for navigational ones.

Running some small web site that no one's going to miss? Don't expect express treatment nor gamble you'll be reincluded.

Meanwhile, Barry points to a WebmasterWorld thread finding that the same thing that got BMW banned is still happening. Well, not quite. As Philipp at Google Blogoscoped points out, the pages are gone from the live site but Google is still retaining cached copies of them. Those cached pages should be dropped over time.

Want to comment or discuss? Please do! Visit our Search Engine Watch Forums threads Google Removes BMW Germany For Spamming or BMW debacle good for SEO?

Posted by Danny Sullivan at 10:40 AM | Permalink

February 7, 2006

More European Automaker Sites Do Doorways & Should Search Engines Be Able To Enforce Spam Rules?

Dave Naylor's been doing a tour of European automotive sites and finding others that are doing the doorway page dance that got BMW banned from Google. Meanwhile, there's some concern in the blogosphere about whether people should be worried about Google's spam rules in general. A look at both issues, below.

Dave's found this page over at Porsche Denmark that redirects to the Porsche Denmark home page. Disable JavaScript (use this handy tool for Firefox), and you can see the underlying textual content that's being cloaked.

It's hard to know what exactly is going on, as I don't read Danish. Since you can't get to this page from the Porsche Denmark home page -- and since it redirects to that home page -- it seems designed mainly to capture searchers looking for a particular topic and route them into Porsche. In other words, a classic doorway page operation.

Here's a better example. Look for klassiske porscher on Google, then you get this page, which redirects to the home page. Disable JavaScript, and the redirection stops, showing you the hidden content. A user never sees that. Porsche has no intention for them to see it. They only want Google to see it, to rank the page well and deliver them a user to a completely different page on the site.

In the comments on Dave's post, David Thulin points to this page at Chevrolet Sweden. Use that tool I mentioned above and disable styles. Now the pretty picture of a Chevy goes away, replaced by hidden text. My Swedish is as good as my Danish -- ie, I can't read this. But it doesn't seem spammy in terms of repetition. Still, scroll to the bottom, and you'll see links to additional doorway pages. Someone clearly realizes search engines don't like the graphical pages they are feeding out, so they've created a series of doorway pages. That degree of savvyness also means they should be aware that search engines generally don't like doorways.

Of course, the entire BMW situation has sparked some interesting pushback in new quarters, people who feel like Google in particular shouldn't be pushing "orthodoxy" or their own results on site designers. Google Orwellian at Publishing 2.0 is one example (I left some comments there), Death Penalty, Investigations? Sounds like the FBI... is another and Google Delists BMW-Germany at Slashdot has some similar comments. Jeremy Zawodny has some pushback of his own on the pushback over here: Google vs. BMW, a sanity check.

I think some of the outcry is mistaken. Google is simply doing what all search engines do, enforcing its own rules on what spam is. That's not anything new or Google specific. Sure, it does warrant examination. Then again, it has also been heavily debated in the past. Not everyone agrees with spam rules, but even those who don't understand that if they do something against the rules, they risk getting tossed out. But perhaps the times are a changing...

For those looking to educate themselves on spam issues, here's a reading list:

  • A Bridge Page Too Far? - From 1998, covers one of the earliest outings of a big company using doorway pages, State Farm.  
  • What Are Doorway Pages? - Originally written as a companion to the article above, I last updated this in 2001, and it's still fairly useful. It gives you an idea of how old school some of the spam tactics the automotive makers are doing.  
  • FTC Steps In To Stop Spamming - From 1999, covers how the US government stepped in to stop one of the worst cases of search spam, when content is used to mislead people (in this case, searches for things like "kids internet games" lead to porn).  
  • Pagejacking Complaint Involves High-Profile Sites - From 2000, similar to the above, covers the issue of content being stolen from a site, cloaked and used to gain rankings. It was more useful in the days before link analysis, when on-the-page factors counted for more.  
  • Ending The Debate Over Cloaking - From 2003, a very long look at what cloaking is, why not everyone agrees it is necessarily evil despite search engine rules and how the focus probably should be on the content rather than the technical delivery structure.  
  • Spam Rules Require Effective Spam Police - From 2004, revisits how search engines have various spam rules but also how they don't disclose if someone's been yanked from an index, something that would probably help site owners.  
  • The Great Doorway Debate - From 2004, a long debate in particular on whether doorway pages (like those the automakers are using) should be considered spam.  
  • Whitehat vs. Blackhat, It Is All BS - From 2004, a long debate on our Search Engine Watch Forums about what spam is, whether there are bad tactics and so on.

  • Working With Google Scholar -- And More Approved Cloaking - From 2004, covers how cloaking isn't so bad if Google decides it helps users.  
  • What, Exactly, is Search Engine Spam? - From 2005, short, to-the-point rundown on some of the things search engines frown upon.  
  • Comment Spam? How About An Ignore Tag? How About An Indexing Summit! - From 2005, covers in part how designers are questioning anew why they should worry about what search engines think.  
  • Talking About Search Engine Spam - From 2005, summarizes a discussion on "white hat versus black hat" tactics and how in my view, intent rather than actual tactics may define what's spam. The summary leads to a long review of the session for Search Engine Watch members.  
  • Google Admits To Cloaking; Bans Itself - From 2005, shows that if Google's following orthodoxy, at least it's happy to ban itself for violating that.  
  • Is Cloaking Deceptive Advertising? Not Necessarily - From 2005, looks at why cloaked content doesn't necessarily spoiled the "level" playing field some believe happens in search engines.  
  • WordPress Caught Spamming After Enlisting To Fight Spam - From 2005, looks at doorway spam that was on the WordPress site and how large, important sites caught up in spamming tend not to be penalized for very long.  
  • White Hat - Gray Hat - Black Hat - From 2005, summarizes even more articles and forum discussions on what spam is, should search engines enforce rules more strongly, is going against guidelines unethical -- you name it!  
  • Worthless Shady Criminals: A Defense Of SEO - Covers why designers would be foolish to ignore the "third browser" of search engines. You might not like the rules; you might think search engines should somehow magically understand what your all image web page is about. But you could also complain that radio needs to change because it refuses to play the pictures in your television ad. Rather than trying to work around the rules, first consider if you can build a web site that pleases human and search engines at the same time. Plenty of people do -- and often end up with more usable web sites, as a result.  
  • Google Testing Notification Of Banning To Webmasters - Covers Google experimenting with warning site owners if they are doing something against the rules.

Need yet more? The SEO: Cloaking and SEO: Spamming categories of the Search Topics area available to Search Engine Watch members takes you back for years with articles on these topics. Plus, becoming a member helps support the site and the creation of content like you're reading right now.

Want to comment or discuss? Please visit our Search Engine Watch Forums thread, Google Removes BMW Germany For Spamming.

Posted by Danny Sullivan at 9:32 AM | Permalink

February 6, 2006

Google Bans BMW Germany, Ricoh Germany

Last week, I wrote about BMW Germany being spotted spamming search engines. Google's Matt Cutts posted on Saturday that the site is now out of Google -- and the Ricoh Germany would also be removed for spamming (it's out now). The move has sparked what I'd call unprecedented coverage by mainstream publications on a spam removal (BBC, Forbes, London Times, Financial Times, Sydney Morning Herald).

The Financial Times and Forbes articles are especially worth reading. BMW criticizes Google for not contacting it first in the Financial Times:

"Google has decided to spread this information which has created this, I'd almost say, media hype," [BMW] said. "They spread it on Saturday, a few days after the pages had been taken off. They hadn't talked to us beforehand which we found a bit surprising."

Hey, how about not having allowed it in the first place? If it wasn't out for the public to see and discuss, BMW wouldn't have an issue.

No doubt smaller companies and individual webmasters will be hearted by the fact that even big companies like BMW can get banned at Google. The reality is, however, that they'll be back in soon. Most big companies that get banned are put back in quickly because searchers expect to find them for navigational queries. I cover that more in our Search Engine Watch Forums thread, Google Removes BMW Germany For Spamming.

Whether BMW will now face a public relations black eye for having spammed Google remains to be seen. I kind of doubt it, but I'm pretty cynical about these things.

Meanwhile, Dave Naylor spots BMW France apparently has spam issues, as well. Somehow, I suspect the talks that Google and BMW are having right now means they might escape the axe if the spammy stuff is quickly removed.

Finally, just a chuckle. Matt's reported by Forbes as being "a blogger who purports to work for Google," while the London Times calls him "a blogger claiming to be a Google software engineer."

Posted by Danny Sullivan at 3:31 PM | Permalink

February 2, 2006

BMW Germany Dinged For Doorway Spam

How about a little old skool search spam? OK, Philipp over at Google Blogoscoped points out in BMW's Doorway Pages that the automaker is employing hidden content on its web site in Germany. Search engines see one thing; humans another. It's cloaking, but not IP based, more of the "poor man's" variety that uses JavaScript.

Look at this page. You'll see nice, pretty pictures of BMWs. Philipp then illustrates how when you turn off JavaScript, you get a page of completely different content, including the use of the word "used car" 42 times in what appears to be a gibberish doorway page. That's a page where the sentences may look like they are saying something to a search algorithm but make no sense to a human reader.

When I disable JavaScript, I actually don't see what Philipp got. Instead, I get a page telling me that I need to have JavaScript to view the site. Part of this seems to be due to some redirection going on. But here's an easy way to see what Google's actually being shown -- this page from the Google cache.

Posted by Danny Sullivan at 12:19 PM | Permalink

December 22, 2005

What's A Scraper Site

It's easy to assume that everyone knows what a scraper site is. Everyone doesn't -- or at least, they know what a scraper site is, they just don't know what they are commonly called. Scraper Sites and SE Ambiguity: What is Your Sites Reading Level? from Stuntdubl gives you a nice rundown on how scrapers grab search results to make "content" that's typically host to Google AdSense ads -- and asks the same question on the minds of many, why does Google fund this junk?

Posted by Danny Sullivan at 10:04 AM | Permalink

December 13, 2005

Yahoo's Jeremy Zawodny Caught In Link Selling Debate

Google Fights Paid Links & Yahoo Defends Paid Links from Barry over at Search Engine Roundtable does a great job of recapping the ironic situation of Yahoo blogvangelist Jeremy Zawodny selling links on his personal blog without using nofollow attributes while the most direct counterpart he has at Google, Matt Cutts, has been urging for months that nofollow should be used on paid links.

While Barry's done the recap, I still wanted to revisit things myself. First, there's the nofollow attribute, which was introduced earlier this year primarily as a way for blog owners to help combat trackback and comment spam. Slap a nofollow on links in these areas, and they don't pass credit for the search engines that support nofollow.

Todd Friesen dubbed nofollow to be a "link condom" (see Link Condom: The Nofollow Parody), a way to interact through links with other sites safely but not actually touching them, at least as Google, Yahoo and MSN will view it. But far from a joke, I later wrote a follow up on how the link condom parody site was a good jumping off point on how nofollow had many other uses, including as a means for those selling links to tell search engines that they meant no harm.

This was a point I made in my original write up on nofollow, Google, Yahoo, MSN Unite On Support For Nofollow Attribute For Links. Nevertheless, the issue of using nofollow in relation to paid links really exploded when O'Reilly was found to be selling links in August.

O'Reilly In Debate Over Link Selling covers that situation as well as the issue of selling links and influencing search engines by buying them in much greater depth. Nofollow was a solution, I explained, for any publisher not wanting to be accused of doing something wrong.

Soon after, in Text links And PageRank, Google's Matt Cutts urged the use of nofollow as a safe way for people to buy links, along with a warning that sites selling links without doing this might not pass along PageRank.

Not everyone agrees with Matt, as you can see in the comments below his post or in this discussion over at our Search Engine Watch Forums: Google's Matt Cutts On Link Selling: Sites Might Not Pass Reputation; Buyers Might Get Targeted More.

Now skip forward to last week. Zawodny Says No to Link Condoms from Greg Boser covers how he received an email telling him that he could now buy links on Jeremy's site (Dave Naylor got a similar email) and the irony that those links don't make use of link condom in the way that Google would prefer, as would likely Jeremy's employer Yahoo, as well. Or maybe not Yahoo, given that as some of the articles above detail, it has come under accusations that its $300 per year Yahoo Directory is nothing more than a giant link selling network.

I was actually going to drop a note to Jeremy and Matt to get both of their views on all of this before posting, but Sponsored Links from Jeremy over at his blog saves part of that work. In it, he explains his viewpoint on not using nofollow. To stress his main points:

  • I didn't hide the links. (Remember the WordPress fiasco?)  
  • They're clearly labeled as sponsored links.  
  • They're far less annoying than distracting graphical ads.  
  • I've made it possible for anyone to comment on them. In public. Who else does that?  
  • They don't show up in my RSS feed(s).  
  • I rejected the on-line casino, drug sales, cheap hotels, and really offensive stuff--basically, anything the reminded me of blog comment spam I've bit hit with or that sends me to a sleazy feeling site. No need to encourage 'em.  
  • The links aren't permanent. They go away after a month (see below).

Those are all fine points, but none of them except the last are likely to make Matt over at Google happy. I'll try to channel him as well as comments on paid link in relation to the impact on search relevancy:

  • The WordPress fiasco wasn't over hiding paid links. It was about having tons of doorways that were found through a hidden link. Different issue completely. WordPress Caught Spamming After Enlisting To Fight Spam covers it in more depth.  
  • Sponsored links have been labeled at other places. That doesn't rob them of link juice. Someone's bought a link on Jeremy's blog with the link text "Local Coupons." That's going to contribute in some way to help it rank for those terms on search engines. The Sponsored Links label doesn't prevent that. Nofollow does.  
  • Whether the links are distracting or not compared to graphical ones isn't why this has come up. The issue is whether the links are contributing to search results being degraded.  
  • Commenting on them is nice, but it still doesn't pull away from the relevancy issue.  
  • Doing some rejecting is fine, but I don't recall Yahoo or other search engines saying anything like, "Don't buy links, unless your company isn't in a sleazy, blog spamming industry."  
  • Links going away is a positive, I suppose, at least to those who might feel Jeremy is messing with search results by selling links.

Further down, Jeremy talks about no one up in arms about the Google AdSense links he carries. Yep, and O'Reilly In Debate Over Link Selling has me covering exactly this same point, when it came up at one of our conferences. AdSense are sponsored links -- they're just "safe" sponsored links in terms of search relevancy that Google doesn't mind.

Need more from Matt directly? SES Chicago 2005 I think has some fresh comments from him below the post on link selling, as does Tell me about your backlinks.

What's going to happen to Jeremy? As Greg notes, he's not going to be yanked from Google. His site is far too important for that. But Google might prevent it from passing along link juice to others. Apparently, I'm told by others (not Google itself) that Google's done the same to Search Engine Watch because of our SEW Marketplace ads that we sell.

If so, Google's just stupid. If it can't figure out that we carry the same sponsored links in the same area and filter out that part, really -- they're dumb. They're even dumber if they have to wipe out the ability of an entire site to help influence its results in a good way. We link to many excellent things -- including things Google wants people to know about. Our links don't carry weight because Google's not smart enough? And Jeremy's site might not carry weight as well? Please.

If you're interested, that O'Reilly In Debate Over Link Selling covers the former paid link program we had here and how ultimately, the SEW Marketplace ads might move to using nofollow down the line. But since none of these were ever sold as ways to help people rank, it's kind of a pain to have to retroactively make that type of move.

Want to comment or discuss? Visit our forum thread, Yahoo's Zawodny In Paid Links & Nofollow Debate.

Postscript: Matt adds his thoughts on the situation over in Text link follow-up which in summary says yes, he still thinks nofollow is the way to go, but Jeremy's free to do as he likes on his site, just as search engines will be free to decide what sites they want to trust based on linkage patterns. But it's more fun to read his actual post, especially because he plays a game of Six Degrees, getting from a paid link to a sex positions web site in two mouse clicks.

Posted by Danny Sullivan at 10:31 AM | Permalink

November 22, 2005

Boser On Reciprocal Links & Whether Inbound Links Can Harm

Greg Boser in The Truth About Reciprocal Link Networks on his new blog looks at how a former client built up 7,000 links in a month and a half, skyrocketing activity that may have blown up for the person in the latest Oct. 2005 Google Jagger update. He also digs into the GotLinks link network that the person was involved with and comes away unimpressed. He also raises the worrisome issue that that links out of your control could potentially harm you.

Posted by Danny Sullivan at 7:21 AM | Permalink

November 14, 2005

HotNacho On The WordPress Spam Saga

WordPress Spam Scam Explained is an undated article giving the Hot Nacho side of the Wordpress spam saga from owner Chad Jones. It might not be new, but I just heard about it via Aaron's SEO Book blog. It highlights how while WordPress was back in Google within a day, HotNacho and other sites owned by Jones remain banned.

That's the biggest takeaway, showing exactly what I said in my article about the WordPress case. If a site is important enough, search engines simply cannot ban it despite spamming issues because it will hurt relevancy. If you type in [wordpress], then you want to find the WordPress site. But if you're a nobody site, look out -- spam a search engine, and there's no particular reason for them to let you back in.

Of course, there are going to be people searching for HotNacho on Google and not finding it because of the ban, and that's actually bad relevancy and somewhat troublesome overall for a service that's supposed to be helping organize the world's information.

It's one thing to ban a site for ranking well on non-navigational terms. But if I type [hotnacho] or [hot nacho] into Google, I really ought to be able to find that site as being relevant for that navigational query. Right now, it doesn't come up. It doesn't come up at Yahoo, either, which wasn't impressed with the HotNacho software.

Yeah, it's sucky to include a link at all to something that you feel like is undermining the quality of your service (and sorry Chad, the pages I saw were sucky and being semi-automated rather than fully-automated in creation doesn't somehow make them better). But for some perspective, what do you think is worse, HotNacho or Nazis? Go search for [nazi] on Google, and it will happily send you off to the American Nazi party. But HotNacho? Oh, no -- now that would be evil.

Postscript: Be sure to see Greg Boser's funny observations here, as well.

Posted by Danny Sullivan at 10:27 AM | Permalink

November 11, 2005

Learning About and Understanding Web Spam

Here's some great reading, viewing and material for your reference shelf. With web spam continuing to be a very hot topic, I wanted to point out two papers and the slide presentations that accompany each of them. Both do a great job of describing web spam issues and making them understandable. Both papers were given during the 14th International World Wide Web Conference and AIRWeb05 that took place in May.

Title: Web Spam Taxonomy (9 pages; PDF) Authors: Zoltn Gyngyi, Hector Garcia-Molina.

Abstract: Web spamming refers to actions intended to mislead search engines into ranking some pages higher than they deserve. Recently, the amount of web spam has increased dramatically, leading to a degradation of search results. This paper presents a comprehensive taxonomy of current spamming techniques, which we believe can help in developing appropriate countermeasures.

Slide Presentation Here 23 slides; PDF.

Title: Web Spam, Propaganda and Trust (9 pages; PDF) Authors: Panagiotis Takis Metaxas, Joe DeStefano

Abstract: Web spamming, the practice of introducing artificial text and links into web pages to affect the results of searches, has been recognized as a major problem for search engines. It is also a serious problem for users because they are not aware of it and they tend to confuse trusting the search engine with trusting the results of a search.

Slide Presentation Here 27 slides; PDF.

Posted by Gary Price at 11:06 PM | Permalink

Moving To Trusted Links & Change The Link Election Model

Thank you, Aaron. That's for taking the research paper (PDF file) about detecting link spam that Gary wrote about earlier and breaking it down in non technical language (and Jim Boykin summarizes Aaron further here). Aaron finds things like the paper says having .edu and .gov links are a good thing, don't worry about having a few spammy links and the more trusted links you have, the better.

I was thinking last night about the way to describe some of the changes or generational evolution we've seeing with counting links, and I thought it might be helpful to break it down this way:

Counting Links / Referendum: Before Google, other search engines made use of links to determine which sites might be important. But this was mainly a counting exercise. The more links the better, regardless of the quality of those links.

  • In simple terms, each link counted the same.  <