My home PC has started running, to paraphrase an Election Night Dan Rather, slower than a lame horse in molasses on a January morning. So I need to reinstall, prompting quite a bit of soul searching on my part: what operating system (OS) do I install; what programs do I really need; what do I need on my computer to be good at my job?
I've come up with a list of about 15 programs I really need, but so as not to make this post go on forever, I've divided the list into categories. Let's start with those programs that save you time.
Top 5 Timesavers
1. Launchy - If you have Launchy, you understand why I can't live without it. If you don't have it, prepare for your life to change. Launchy is a keystroke launcher, which basically means it's a better way of doing everything on your computer: launching programs, finding files on your desktop, performing web searches; visiting web sites. It can even give you local weather and perform calculations. All you need to is press whatever shortcut key you've assigned to Launchy, start typing, and whatever program or file you want comes up. Forget the Start Menu; forget Windows Explorer; forget your browser. Launchy will change all that--and save you a considerable amount of while doing it.
2. X1 - A few of these tools, like Launchy above and Ergo below, perform desktop search. But none of them do it as well as X1, which includes live searching abilities, numerous advanced search options, extensive previewing tools, and active email abilities. I'm organized (on the computer at least), but with hundreds of emails daily, along with reports for numerous clients, desktop search is a must-have time saver. No one does it better than X1. And, believe it or not, X1 is still free! You may not know that going to their site, as they only offer a preview of the newest version, but you can download older versions, which still work better than any other option, in the X1 Forums.
3. Ergo - The last search tool I use is Ergo. It's a cool visual search engine that combines a bunch of web search options with desktop search. What I really use it for is the annotation tools it has to mark up and share websites, and the cool grouping options it has to parse or organize search results. Truly smart search may still be a dream (especially for us SEOs, as it would mean the end of keyword research), but visual search tools like Ergo and SearchMe and clustering tools like Vivisimo's Clusty provide the next best thing: the ability to find what you are actually looking for before you go through results. Trust me; when you can search without browsing, you'll find the site you need in half the time.
4. Snag-it - Last but not least, Snag-it has proved invaluable for me when it comes to reporting. If you take as many screenshots as I do--of great search results, YouTube honors, social bookmarking and networking standings and occasional snafus--you know the hassle of trimming shots in Word or PhotoShop. And if you need to blur something out or add any effects, a 1-minute task blooms into a 10-minute endeavor. If you work with more than one monitor, double those estimates. Snag-it solves all that; copy only what you want from the screen and add effects on the fly. It's the only piece of software on this list that isn't free, but it's worth it.
Posted by EliFeldblum at 4:24 AM | Permalink
Widemile Inc. has announced the beta availability of its third-generation optimization and multi-variate testing platform. The new technology allows users to simultaneously test a variety of offers, text, images, and other key variables.
The announcement coincides with 13 partnerships with companies participating in the initial platform. Those companies include Ascentium, Avenue A | Razorfish, Brand Digital, Closed Loop Marketing, DDB in Seattle, Palazzo Intercreative, POP, Portent Interactive, Red Bricks Media, SolutionSet, Stratigent, TMP Directional Marketing, and ZeroDash1.
"Widemile's third-generation software-as-a-service (SaaS) multivariate optimization system was specifically designed using open software and systems to meet enterprise standards for security, stability and performance," said Dean Kimball, Widemile co-founder and CTO. "Developed with partners in mind, the Widemile optimization system contains a wide range of testing, reporting and client management capabilities within an easy-to-use browser-based application, and provides a level of performance and interactivity that has previously only been possible with desktop applications."
Those partners are lining up to sing Widemile's praises. Randy Barney, Director of Site Optimization, Avenue A | Razorfish said, "We're excited about Widemile's approach and toolset, which is structured to scale with our business and client needs," while Lance Loveday, CEO of Closed Loop Marketing articulated that "Widemile is positioned well to enable us to seamlessly provide optimization services to our clients."
Posted by Nathania at 10:55 AM | Permalink
The New York Times, CNET, InformationWeek, and 52 other Google News sources missed the significance of Microsoft's new Research Lab in Cambridge, Mass., headed by Jennifer Chayes and her husband, Christian Borgs. The Times implied that Chayes and Borgs work in an ivory tower where basic research doesn't have a business imperative.
Nothing could be further from the truth in the online world.
Jennifer Tour Chayes, PhD in mathematical physics, led the highly esteemed Theory Group specializing in theoretical computer science. She's the co-author of almost 100 scientific papers and co-inventor of more than 20 patents. The New York Times only mentions her work in developing simple models of liquids and solids and the development of some exceedingly fast networking algorithms. Hunh?
Their groundbreaking work in search engine algorithms and social search may be the foundation of a successful Microsoft-Yahoo merger.
Chayes is one of the world’s experts in the modeling and analysis of random, dynamically growing graphs (social graph, social search, Facebook, MySpace) – which are used to model the Internet, the World Wide Web and social networks.
One of the papers the couple co-authored, "Bid optimization in online advertisement auctions", details the ways paid search campaigns can be optimized by advertisers and search engines. "Multi-unit auctions with budget-constrained bidders", written by Borgs, Chayes, Nicole Immorlica (MIT), Mohammad Mahdian, and Amin Saberi (published in June 2005), discusses ways to optimize revenue for search engines given the fixed budgets of search marketers.
Their recent work provides a tutorial on search engine optimization and PageRank, before delving deep into algorithms few search marketers (myself included) understand.
Search engine optimization lives and dies by PageRank. Here's what you need to know about their research into PageRank.
Borgs and Chayes go beyond where a Web page ranks and explore the pages or sets of pages that contribute most to its rank. That's the foundation of link building. With the exception of link farms, link building has largely been a manual effort, somewhat arcane, but vital to SEO. PageRank contributions have been used for link spam detection and in the classification of web pages.
Chayes and Borg note that a set of pages contributing significantly to the PageRank of a page is often called a "contribution set" or "supporting set" of the page. Their work goes a long way toward solving the mysteries of Google PageRank -- and fighting the spam that threatens to degrade the relevancy of all search engine results pages.
Link spam can be detected in many ways besides the SpamRank-type algorithms: applying machine learning to link-based features, the analysis of page content, TrustRank, and Anti-TrustRank, and statistical analysis of various page features. Chayes, Borgs and their research associates use the local algorithm developed here to design several locally computable page features for link spam detection, and evaluate these features experimentally.
Chayes' contributions to Microsoft technologies include the development of methods to analyze network structure and behavior, auction algorithm design (i.e. paid search auctions), and online business model design and analysis.
She's famous for her work on phase transitions in problems in discrete mathematics and theoretical computer science. The result? The rise of some of the fastest known algorithms for fundamental problems in combinatorial optimization, the intersection of artificial intelligence, mathematics and software engineering. That would be search engine algorithms, paid search auctions and search engine revenue optimization.
Algorithms fuel search engines, spam filters, online advertising engines, social networks, machine translation and most of the online world. Social sciences - economics, psychology and sociology - analyze how and why people value things and study how people interact with each other. That's why, for example, Hal Varian, plays a key role in Google's success as the company's chief economist.
That's why Google's Marissa Mayer says social search is the future of Google.
That's the core of Search Engine WarGames.
Posted by Kevin Heisler at 3:32 PM | Permalink
A number of the leading online news publishers are looking to organize greater control over how and what news of theirs gets listed in the search results of the various search engines, according to a report by the Associated Press.
"Currently, Google Inc., Yahoo Inc. and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as "robots.txt," which a search engine's indexing software, called a crawler, knows to look for on a site," AP noted.
Though the individual engines have other proprietorial code and the publishers want to have a greater influence on how this is developed and would like to see a unified methodology, the article reported.
"The current system doesn't give sites "enough flexibility to express our terms and conditions on access and use of content," said Angela Mills Wade, executive director of the European Publishers Council, one of the organizations behind the proposal. "That is not surprising. It was invented in the 1990s and things move on," Wade told AP.
Robots.txt files were first developed in 1994 and have been the standard method webmasters use to block spiders (the crawlers search engines use to go through websites' content). However, there has been much conversation online over the past 5-6 years that some crawlers ignore the robots.txt file.
The publishers desire for "proposed extensions, known as Automated Content Access Protocol, partly grew out of those disputes. Leading the ACAP effort were groups representing publishers of newspapers, magazines, online databases, books and journals. The AP is one of dozens of organizations that have joined ACAP", AP noted.
Posted by aussiewebmaster at 1:10 PM | Permalink
No way I could pass on pointing to the Debby Richman blockbuster post.
Wharton says: Online recommendation engines may chop off Long Tail of Search.
Prick up your ears, Chris Anderson Your Long Tail doberman (below) is under attack:
> is in the Page Title…nice trick) www.pandia.com/sew/169-duplicate-content.html
Duplicate Content - Get it right or perish www.webmasterworld.com/google/3060898.htm
Regular Google results:
Duplicate Content Issues www.seroundtable.com/archives/003398.html
Avoiding Duplicate Content Penalties www.elixirsystems.com/seo_tips/avoiding-duplicate-content-penalty.php
Official Google Webmaster Central Blog: Deftly dealing with ... (ahem perhaps Webmaster Central Blog should get shorter Titles too?) googlewebmastercentral.blogspot.com/2006/12/deftly-dealing-with-duplicate-content.html
I would give this one a tie. It is probably due to the fact that the term “duplicate content” is so often used by bloggers and in forums that the top three are all SEO-related in the regular results of each of these searches. It seems interesting and also a good sign that Google’s Webmaster Central Blog does not show in the top three on the SEM-search feature (they are on the list). Obviously Google is not trying to manipulate any results in favor of their own blog.
I feel very confident that I will be using the SEM search blog almost exclusively to search for SEO and Paid Search topics in the near future. Thanks for this great tool, Alister, Lee and Google – I am pretty certain that you better get ready for lots of traffic from search marketers and students interested in the subject as well. In fact, if I was to advertise, student-populated sites would probably be my first target. After all, this will probably end up getting blogged failry heavily so those in the SEM community should find out very rapidly.
Please share your thoughts on this search function at the thread at SEW Forums, Google Custom Search For Search Marketers and Search Students
Posted by Chris Boggs at 10:13 AM | Permalink
Update: Liana Evans of Seach Marketing Gurus has done a great job of journalism and corrected some of the errors that myself and others posted about this story. You can read about it here.
Wikipedia founder Jimmy Wales plans, in partnership with Amazon, on launching a search engine early next year, accordiing to the London Times
Wales contends that Google has developed flaws as it has grown. And believes he can use his wiki methodology to compete with Google, Yahoo and MSN.
He told the Times that computer algorithms do not make as good selections as humans and if people get to use his alternative they may prefer it.
“But we have a really great method for doing that ourselves,” Wales told The Times. “We just look at the page. It usually only takes a second to figure out if the page is good, so the key here is building a community of trust that can do that.”
"The reputation already fostered by his Wikipedia community and the transparency of his technology will build sufficient trust in his search engine to bring in advertising revenue and make the Wikiasari venture profitable" The Times reported.
The project has been called Wikiasari - a combination of Hawaian and Japanese for "quick" "rummaging search". How this plays out should provide entertainment in the new year.
Update: Michael Arrington at TechCrunch has come up with a screen shot of the new engine. Looks like sponsored listings at the right, related links at the top and organic results where they normally appear.
There is also more detail and comments from Wales.
Posted by aussiewebmaster at 6:47 PM | Permalink
This weekend The Register published an article named Google developing eavesdropping software. The article describes how Google uses existing PC microphones fingerprinting technology to show relevant ads that appeal more to you. The article goes on to explain how the sound fingerprinting works; it "breaks sound into a five-second snippets to pick out audio from a TV, reducing the snippet to a digital "fingerprint", which it matches on an internet server." Privacy folks are worried about the repercussions of such software.
Postscript Barry: I should link to Google Paper Explains Listening To Your TV Can Help It Put Ads & Info On Your Computer we covered back in Jun. 9, 2006.
Posted by Barry Schwartz at 10:50 AM | Permalink
A New York Times article has a detailed analysis of Google's infrastructure and discussion with Urs Hlzle, senior vice president for operations at Google. Here are some of the key points I pulled from that article.
+ Google tends builds from ground up versus buying. + Google's computing costs are half those of other large Internet companies and a tenth those of traditional corporate technology users. + Critics call Google's philosophy "unnecessary and inefficient." + "Google is reducing cost while maintaining performance by shifting the burden of reliability from hardware to software individual hardware components can fail, but software automatically shifts the local task and the data to other machines." + Google is among Advanced Micro's five largest clients.
Posted by Barry Schwartz at 9:51 AM | Permalink
New York Times Looks At Google's Hardware & InfrastructureA New York Times article has a detailed analysis of Google's infrastructure and discussion with Urs Hlzle, senior vice president for operations at Google. Here are some of the key points I pulled from that article.
+ Google tends builds from ground up versus buying. + Google's computing costs are half those of other large Internet companies and a tenth those of traditional corporate technology users. + Critics call Google's philosophy "unnecessary and inefficient." + "Google is reducing cost while maintaining performance by shifting the burden of reliability from hardware to software individual hardware components can fail, but software automatically shifts the local task and the data to other machines." + Google is among Advanced Micro's five largest clients.
Posted by Kevin Heisler at 9:51 AM | Permalink
New York Times Looks At Google's Hardware & InfrastructureA New York Times article has a detailed analysis of Google's infrastructure and discussion with Urs Hlzle, senior vice president for operations at Google. Here are some of the key points I pulled from that article.
+ Google tends builds from ground up versus buying. + Google's computing costs are half those of other large Internet companies and a tenth those of traditional corporate technology users. + Critics call Google's philosophy "unnecessary and inefficient." + "Google is reducing cost while maintaining performance by shifting the burden of reliability from hardware to software individual hardware components can fail, but software automatically shifts the local task and the data to other machines." + Google is among Advanced Micro's five largest clients.
Posted by Kevin Heisler at 9:51 AM | Permalink
New York Times Looks At Google's Hardware & InfrastructureA New York Times article has a detailed analysis of Google's infrastructure and discussion with Urs Hlzle, senior vice president for operations at Google. Here are some of the key points I pulled from that article.
+ Google tends builds from ground up versus buying. + Google's computing costs are half those of other large Internet companies and a tenth those of traditional corporate technology users. + Critics call Google's philosophy "unnecessary and inefficient." + "Google is reducing cost while maintaining performance by shifting the burden of reliability from hardware to software individual hardware components can fail, but software automatically shifts the local task and the data to other machines." + Google is among Advanced Micro's five largest clients.
Posted by Kevin Heisler at 9:51 AM | Permalink
There are many people discussing a recent patent Google was awarded for picking up on ambient audio from your TV and pairing those sounds to your computer to serve up ads based on what you are watching (or something like that). Google Research Scientists, Michele Covell & Shumeet Baluja, described the technology as;
We showed how to sample the ambient sound emitted from a TV and automatically determine what is being watched from a small signature of the sound -- all with complete privacy and minuscule effort. The system could keep up with users while they channel surf, presenting them with a real-time forum about a live political debate one minute and an ad-hoc chat room for a sporting event in the next. And, all of this would be done without users ever having to type or to even know the name of the program or channel being viewed. Taking this further, we could collect snippets from the web describing the actors appearing in a movie or present maps of locales within the movie as it takes place (no matter if users are watching it as a live broadcast or as a recoded broadcast).There are two additional articles that have good coverage of this, that I am aware of. The first is at Small Biz Pipeline and the second is at TechCrunch. I particularly like how TechCrunch pulled out the four main points of the paper, as such;
+ Personalized information layers Heres what Tom Cruise is wearing in the show you are watching and here's where you can buy the same clothes in your zip code. + Ad hoc social peer communities If you would like to chat about this show, ten of your college friends are watching it right now as well. + Real-time popularity ratings Nielsen requires hardware and the results aren't available in real-time. You might want to know if there is a spike in viewers watching the show on channel 9 right now. Advertisers might want to know that too. + TV- based bookmarks Click to save a show or clip into your video library and there will be more than just a few shows available for watching later.Posted by Barry Schwartz at 8:43 AM | Permalink
Google Paper Explains Listening To Your TV Can Help It Put Ads & Info On Your ComputerThere are many people discussing a recent patent Google was awarded for picking up on ambient audio from your TV and pairing those sounds to your computer to serve up ads based on what you are watching (or something like that). Google Research Scientists, Michele Covell & Shumeet Baluja, described the technology as;
We showed how to sample the ambient sound emitted from a TV and automatically determine what is being watched from a small signature of the sound -- all with complete privacy and minuscule effort. The system could keep up with users while they channel surf, presenting them with a real-time forum about a live political debate one minute and an ad-hoc chat room for a sporting event in the next. And, all of this would be done without users ever having to type or to even know the name of the program or channel being viewed. Taking this further, we could collect snippets from the web describing the actors appearing in a movie or present maps of locales within the movie as it takes place (no matter if users are watching it as a live broadcast or as a recoded broadcast).There are two additional articles that have good coverage of this, that I am aware of. The first is at Small Biz Pipeline and the second is at TechCrunch. I particularly like how TechCrunch pulled out the four main points of the paper, as such;
+ Personalized information layers Heres what Tom Cruise is wearing in the show you are watching and here's where you can buy the same clothes in your zip code. + Ad hoc social peer communities If you would like to chat about this show, ten of your college friends are watching it right now as well. + Real-time popularity ratings Nielsen requires hardware and the results aren't available in real-time. You might want to know if there is a spike in viewers watching the show on channel 9 right now. Advertisers might want to know that too. + TV- based bookmarks Click to save a show or clip into your video library and there will be more than just a few shows available for watching later.Posted by Kevin Heisler at 8:43 AM | Permalink
Google Paper Explains Listening To Your TV Can Help It Put Ads & Info On Your ComputerThere are many people discussing a recent patent Google was awarded for picking up on ambient audio from your TV and pairing those sounds to your computer to serve up ads based on what you are watching (or something like that). Google Research Scientists, Michele Covell & Shumeet Baluja, described the technology as;
We showed how to sample the ambient sound emitted from a TV and automatically determine what is being watched from a small signature of the sound -- all with complete privacy and minuscule effort. The system could keep up with users while they channel surf, presenting them with a real-time forum about a live political debate one minute and an ad-hoc chat room for a sporting event in the next. And, all of this would be done without users ever having to type or to even know the name of the program or channel being viewed. Taking this further, we could collect snippets from the web describing the actors appearing in a movie or present maps of locales within the movie as it takes place (no matter if users are watching it as a live broadcast or as a recoded broadcast).There are two additional articles that have good coverage of this, that I am aware of. The first is at Small Biz Pipeline and the second is at TechCrunch. I particularly like how TechCrunch pulled out the four main points of the paper, as such;
+ Personalized information layers Heres what Tom Cruise is wearing in the show you are watching and here's where you can buy the same clothes in your zip code. + Ad hoc social peer communities If you would like to chat about this show, ten of your college friends are watching it right now as well. + Real-time popularity ratings Nielsen requires hardware and the results aren't available in real-time. You might want to know if there is a spike in viewers watching the show on channel 9 right now. Advertisers might want to know that too. + TV- based bookmarks Click to save a show or clip into your video library and there will be more than just a few shows available for watching later.Posted by Kevin Heisler at 8:43 AM | Permalink
Google Paper Explains Listening To Your TV Can Help It Put Ads & Info On Your ComputerThere are many people discussing a recent patent Google was awarded for picking up on ambient audio from your TV and pairing those sounds to your computer to serve up ads based on what you are watching (or something like that). Google Research Scientists, Michele Covell & Shumeet Baluja, described the technology as;
We showed how to sample the ambient sound emitted from a TV and automatically determine what is being watched from a small signature of the sound -- all with complete privacy and minuscule effort. The system could keep up with users while they channel surf, presenting them with a real-time forum about a live political debate one minute and an ad-hoc chat room for a sporting event in the next. And, all of this would be done without users ever having to type or to even know the name of the program or channel being viewed. Taking this further, we could collect snippets from the web describing the actors appearing in a movie or present maps of locales within the movie as it takes place (no matter if users are watching it as a live broadcast or as a recoded broadcast).There are two additional articles that have good coverage of this, that I am aware of. The first is at Small Biz Pipeline and the second is at TechCrunch. I particularly like how TechCrunch pulled out the four main points of the paper, as such;
+ Personalized information layers Heres what Tom Cruise is wearing in the show you are watching and here's where you can buy the same clothes in your zip code. + Ad hoc social peer communities If you would like to chat about this show, ten of your college friends are watching it right now as well. + Real-time popularity ratings Nielsen requires hardware and the results aren't available in real-time. You might want to know if there is a spike in viewers watching the show on channel 9 right now. Advertisers might want to know that too. + TV- based bookmarks Click to save a show or clip into your video library and there will be more than just a few shows available for watching later.Posted by Kevin Heisler at 8:43 AM | Permalink
Ever wonder how spider/bots/crawlers behave? Well, if you did a new analysis "On Bots" was released at http://drunkmenworkhere.org/219. The article has an analysis and visualization of the behavior of search robots. The analysis covers Yahoo Slurp, Googlebot and MSNbot crawling 2 billion pages structured in a binary tree over 1 year. The study was conducted on a single site, so I am not sure how statistically valid it is over all sites on the Web. Just take a look at the overall results to see how much of a hog Yahoo is.
Posted by Barry Schwartz at 2:56 PM | Permalink
Search Bot BehaviorEver wonder how spider/bots/crawlers behave? Well, if you did a new analysis "On Bots" was released at http://drunkmenworkhere.org/219. The article has an analysis and visualization of the behavior of search robots. The analysis covers Yahoo Slurp, Googlebot and MSNbot crawling 2 billion pages structured in a binary tree over 1 year. The study was conducted on a single site, so I am not sure how statistically valid it is over all sites on the Web. Just take a look at the overall results to see how much of a hog Yahoo is.
Posted by Kevin Heisler at 2:56 PM | Permalink
Search Bot BehaviorEver wonder how spider/bots/crawlers behave? Well, if you did a new analysis "On Bots" was released at http://drunkmenworkhere.org/219. The article has an analysis and visualization of the behavior of search robots. The analysis covers Yahoo Slurp, Googlebot and MSNbot crawling 2 billion pages structured in a binary tree over 1 year. The study was conducted on a single site, so I am not sure how statistically valid it is over all sites on the Web. Just take a look at the overall results to see how much of a hog Yahoo is.
Posted by Kevin Heisler at 2:56 PM | Permalink
Search Bot BehaviorEver wonder how spider/bots/crawlers behave? Well, if you did a new analysis "On Bots" was released at http://drunkmenworkhere.org/219. The article has an analysis and visualization of the behavior of search robots. The analysis covers Yahoo Slurp, Googlebot and MSNbot crawling 2 billion pages structured in a binary tree over 1 year. The study was conducted on a single site, so I am not sure how statistically valid it is over all sites on the Web. Just take a look at the overall results to see how much of a hog Yahoo is.
Posted by Kevin Heisler at 2:56 PM | Permalink
Microsoft's Camera Phone Search Project and Other Camera Phone Search Tech from ResourceShelf covers a new Microsoft Research project allowing you to take pictures of things in order to get search results back about it.
Snap something with your camera phone, then that goes into an image search database, which identifies the object or type of object in order to run other types of searches about it. Or that's the idea. You can't try it yet, and Microsoft isn't even certain what they may do with it.
How about searching by taking pictures of bar codes? Completely different idea than this project, but thanks for asking! The ResourceShelf post gives you resources on the whole Amazon bar code searching in Japan thing, for the curious. And Frucall, mentioned yesterday by Brian, deals with bar code searching as well. The downside is you have to key in the numbers.
Posted by Danny Sullivan at 7:31 AM | Permalink
Microsoft's Search By Camera Phone Research ProjectMicrosoft's Camera Phone Search Project and Other Camera Phone Search Tech from ResourceShelf covers a new Microsoft Research project allowing you to take pictures of things in order to get search results back about it.
Snap something with your camera phone, then that goes into an image search database, which identifies the object or type of object in order to run other types of searches about it. Or that's the idea. You can't try it yet, and Microsoft isn't even certain what they may do with it.
How about searching by taking pictures of bar codes? Completely different idea than this project, but thanks for asking! The ResourceShelf post gives you resources on the whole Amazon bar code searching in Japan thing, for the curious. And Frucall, mentioned yesterday by Brian, deals with bar code searching as well. The downside is you have to key in the numbers.
Posted by Kevin Heisler at 7:31 AM | Permalink
Microsoft's Search By Camera Phone Research ProjectMicrosoft's Camera Phone Search Project and Other Camera Phone Search Tech from ResourceShelf covers a new Microsoft Research project allowing you to take pictures of things in order to get search results back about it.
Snap something with your camera phone, then that goes into an image search database, which identifies the object or type of object in order to run other types of searches about it. Or that's the idea. You can't try it yet, and Microsoft isn't even certain what they may do with it.
How about searching by taking pictures of bar codes? Completely different idea than this project, but thanks for asking! The ResourceShelf post gives you resources on the whole Amazon bar code searching in Japan thing, for the curious. And Frucall, mentioned yesterday by Brian, deals with bar code searching as well. The downside is you have to key in the numbers.
Posted by Kevin Heisler at 7:31 AM | Permalink
Microsoft's Search By Camera Phone Research ProjectMicrosoft's Camera Phone Search Project and Other Camera Phone Search Tech from ResourceShelf covers a new Microsoft Research project allowing you to take pictures of things in order to get search results back about it.
Snap something with your camera phone, then that goes into an image search database, which identifies the object or type of object in order to run other types of searches about it. Or that's the idea. You can't try it yet, and Microsoft isn't even certain what they may do with it.
How about searching by taking pictures of bar codes? Completely different idea than this project, but thanks for asking! The ResourceShelf post gives you resources on the whole Amazon bar code searching in Japan thing, for the curious. And Frucall, mentioned yesterday by Brian, deals with bar code searching as well. The downside is you have to key in the numbers.
Posted by Kevin Heisler at 7:31 AM | Permalink
Back in September, SEW Forums moderator Edel "Orion" Garcia posted a thread about a new search technology under development. It was coincidentally called the "Orion Search Engine" but not connected with our moderator. Instead, it was developed by a university student who now, according to news reports out this weekend, works for Google. Google's also acquired his search technology.
How great this search engine was is impossible to say. The press release that inventor Ori Allon put out last September was full of excitement, but so are plenty of releases trying to attract the attention of investors and the media. The search engine itself was never available for the public to use.
It sounds like Allon mainly developed an algorithm useful in pulling out better summaries of web pages. In other words, if you did a search, you'd be likely to get back extracted sections of pages most relevant to your query. From the release:
The results to the query are displayed immediately in the form of expanded text extracts, giving you the relevant information without having to go the website.
Such extraction could work well with moves by Google to expand direct answers that it offers, something all search engines are doing. Of course, the more Google and other search engines extract heavily from web pages without sending them actual traffic, the more likely they'll come under legal pressures of stepping over the fair use line.
Via Threadwatch, Google buys search algorithm invented by Israeli student from Haaretz has more details on Google getting the rights to the Orion algorithm and confirmation that Allon now works for Google. His university says that Yahoo and Microsoft were also in negotiations for the technology.
Google wins rights to Aussie algorithm from The Age reports that Allon's been with Google for about six weeks. However, Microsoft chairman Bill Gates never commented on the technology, to my knowledge. The Age just seems confused that Allon's press release mentioned public comments by Gates that there's room for improvement generally in search.
Google does deal for Aussie program from the Daily Telegraph pitches that the technology will revolutionize the way we search. Ho hum. Reality check, OK? When Google acquired the three people from Kaltix along with their search technology back in 2003, it hardly created a revolutionary change for us soon after.
By revolutionary, I mean a radical shake-up of how we search or a major leap-frogging past other players. That didn't happen post-Kaltix. We did indeed see better personalized search come from Google, what I find one of its most impressive features. But that's an evolutionary change. It works on top of other things Google has built. It doesn't overturn and throw out the base technology.
So my reality check alarm is mainly for anyone who thinks Google's going to suddenly change because Allon and this extraction algorithm are now at Google. He gives Google another good employee, and the technology will probably give Google another evolutionary change that may improve things over time, rather than instanty.
Want to comment or discuss? Visit our Search Engine Watch Forums thread, The Orion Search Engine.
Posted by Danny Sullivan at 7:56 AM | Permalink
Google Hires Orion Search Engine Creator; Gets Extraction AlgorithmBack in September, SEW Forums moderator Edel "Orion" Garcia posted a thread about a new search technology under development. It was coincidentally called the "Orion Search Engine" but not connected with our moderator. Instead, it was developed by a university student who now, according to news reports out this weekend, works for Google. Google's also acquired his search technology.
How great this search engine was is impossible to say. The press release that inventor Ori Allon put out last September was full of excitement, but so are plenty of releases trying to attract the attention of investors and the media. The search engine itself was never available for the public to use.
It sounds like Allon mainly developed an algorithm useful in pulling out better summaries of web pages. In other words, if you did a search, you'd be likely to get back extracted sections of pages most relevant to your query. From the release:
The results to the query are displayed immediately in the form of expanded text extracts, giving you the relevant information without having to go the website.
Such extraction could work well with moves by Google to expand direct answers that it offers, something all search engines are doing. Of course, the more Google and other search engines extract heavily from web pages without sending them actual traffic, the more likely they'll come under legal pressures of stepping over the fair use line.
Via Threadwatch, Google buys search algorithm invented by Israeli student from Haaretz has more details on Google getting the rights to the Orion algorithm and confirmation that Allon now works for Google. His university says that Yahoo and Microsoft were also in negotiations for the technology.
Google wins rights to Aussie algorithm from The Age reports that Allon's been with Google for about six weeks. However, Microsoft chairman Bill Gates never commented on the technology, to my knowledge. The Age just seems confused that Allon's press release mentioned public comments by Gates that there's room for improvement generally in search.
Google does deal for Aussie program from the Daily Telegraph pitches that the technology will revolutionize the way we search. Ho hum. Reality check, OK? When Google acquired the three people from Kaltix along with their search technology back in 2003, it hardly created a revolutionary change for us soon after.
By revolutionary, I mean a radical shake-up of how we search or a major leap-frogging past other players. That didn't happen post-Kaltix. We did indeed see better personalized search come from Google, what I find one of its most impressive features. But that's an evolutionary change. It works on top of other things Google has built. It doesn't overturn and throw out the base technology.
So my reality check alarm is mainly for anyone who thinks Google's going to suddenly change because Allon and this extraction algorithm are now at Google. He gives Google another good employee, and the technology will probably give Google another evolutionary change that may improve things over time, rather than instanty.
Want to comment or discuss? Visit our Search Engine Watch Forums thread, The Orion Search Engine.
Posted by Kevin Heisler at 7:56 AM | Permalink
Google Hires Orion Search Engine Creator; Gets Extraction AlgorithmBack in September, SEW Forums moderator Edel "Orion" Garcia posted a thread about a new search technology under development. It was coincidentally called the "Orion Search Engine" but not connected with our moderator. Instead, it was developed by a university student who now, according to news reports out this weekend, works for Google. Google's also acquired his search technology.
How great this search engine was is impossible to say. The press release that inventor Ori Allon put out last September was full of excitement, but so are plenty of releases trying to attract the attention of investors and the media. The search engine itself was never available for the public to use.
It sounds like Allon mainly developed an algorithm useful in pulling out better summaries of web pages. In other words, if you did a search, you'd be likely to get back extracted sections of pages most relevant to your query. From the release:
The results to the query are displayed immediately in the form of expanded text extracts, giving you the relevant information without having to go the website.
Such extraction could work well with moves by Google to expand direct answers that it offers, something all search engines are doing. Of course, the more Google and other search engines extract heavily from web pages without sending them actual traffic, the more likely they'll come under legal pressures of stepping over the fair use line.
Via Threadwatch, Google buys search algorithm invented by Israeli student from Haaretz has more details on Google getting the rights to the Orion algorithm and confirmation that Allon now works for Google. His university says that Yahoo and Microsoft were also in negotiations for the technology.
Google wins rights to Aussie algorithm from The Age reports that Allon's been with Google for about six weeks. However, Microsoft chairman Bill Gates never commented on the technology, to my knowledge. The Age just seems confused that Allon's press release mentioned public comments by Gates that there's room for improvement generally in search.
Google does deal for Aussie program from the Daily Telegraph pitches that the technology will revolutionize the way we search. Ho hum. Reality check, OK? When Google acquired the three people from Kaltix along with their search technology back in 2003, it hardly created a revolutionary change for us soon after.
By revolutionary, I mean a radical shake-up of how we search or a major leap-frogging past other players. That didn't happen post-Kaltix. We did indeed see better personalized search come from Google, what I find one of its most impressive features. But that's an evolutionary change. It works on top of other things Google has built. It doesn't overturn and throw out the base technology.
So my reality check alarm is mainly for anyone who thinks Google's going to suddenly change because Allon and this extraction algorithm are now at Google. He gives Google another good employee, and the technology will probably give Google another evolutionary change that may improve things over time, rather than instanty.
Want to comment or discuss? Visit our Search Engine Watch Forums thread, The Orion Search Engine.
Posted by Kevin Heisler at 7:56 AM | Permalink
Google Hires Orion Search Engine Creator; Gets Extraction AlgorithmBack in September, SEW Forums moderator Edel "Orion" Garcia posted a thread about a new search technology under development. It was coincidentally called the "Orion Search Engine" but not connected with our moderator. Instead, it was developed by a university student who now, according to news reports out this weekend, works for Google. Google's also acquired his search technology.
How great this search engine was is impossible to say. The press release that inventor Ori Allon put out last September was full of excitement, but so are plenty of releases trying to attract the attention of investors and the media. The search engine itself was never available for the public to use.
It sounds like Allon mainly developed an algorithm useful in pulling out better summaries of web pages. In other words, if you did a search, you'd be likely to get back extracted sections of pages most relevant to your query. From the release:
The results to the query are displayed immediately in the form of expanded text extracts, giving you the relevant information without having to go the website.
Such extraction could work well with moves by Google to expand direct answers that it offers, something all search engines are doing. Of course, the more Google and other search engines extract heavily from web pages without sending them actual traffic, the more likely they'll come under legal pressures of stepping over the fair use line.
Via Threadwatch, Google buys search algorithm invented by Israeli student from Haaretz has more details on Google getting the rights to the Orion algorithm and confirmation that Allon now works for Google. His university says that Yahoo and Microsoft were also in negotiations for the technology.
Google wins rights to Aussie algorithm from The Age reports that Allon's been with Google for about six weeks. However, Microsoft chairman Bill Gates never commented on the technology, to my knowledge. The Age just seems confused that Allon's press release mentioned public comments by Gates that there's room for improvement generally in search.
Google does deal for Aussie program from the Daily Telegraph pitches that the technology will revolutionize the way we search. Ho hum. Reality check, OK? When Google acquired the three people from Kaltix along with their search technology back in 2003, it hardly created a revolutionary change for us soon after.
By revolutionary, I mean a radical shake-up of how we search or a major leap-frogging past other players. That didn't happen post-Kaltix. We did indeed see better personalized search come from Google, what I find one of its most impressive features. But that's an evolutionary change. It works on top of other things Google has built. It doesn't overturn and throw out the base technology.
So my reality check alarm is mainly for anyone who thinks Google's going to suddenly change because Allon and this extraction algorithm are now at Google. He gives Google another good employee, and the technology will probably give Google another evolutionary change that may improve things over time, rather than instanty.
Want to comment or discuss? Visit our Search Engine Watch Forums thread, The Orion Search Engine.
Posted by Kevin Heisler at 7:56 AM | Permalink
Posted by Barry Schwartz at 8:38 AM | Permalink
Search Technologists Flake & Broder Speak There are two good interviews with search technologists out there for us to read. John Battelle posted A Frank Interview with Gary Flake yesterday. Battelle introduces Gary Flake as "a veteran of Overture, Yahoo and now Microsoft's vaunted research labs (he's founder and director of the new "Live Labs.")" Also about a month ago, the Yahoo Search Blog posted "A chat with Andrei Broder" Part I, Part II and Part III. Andrei Broder was the VP of research and chief scientist at AltaVista, and is now the Yahoo Research Fellow and Vice President of Emerging Search Technology.Posted by Kevin Heisler at 8:38 AM | Permalink
Search Technologists Flake & Broder Speak There are two good interviews with search technologists out there for us to read. John Battelle posted A Frank Interview with Gary Flake yesterday. Battelle introduces Gary Flake as "a veteran of Overture, Yahoo and now Microsoft's vaunted research labs (he's founder and director of the new "Live Labs.")" Also about a month ago, the Yahoo Search Blog posted "A chat with Andrei Broder" Part I, Part II and Part III. Andrei Broder was the VP of research and chief scientist at AltaVista, and is now the Yahoo Research Fellow and Vice President of Emerging Search Technology.Posted by Kevin Heisler at 8:38 AM | Permalink
Search Technologists Flake & Broder Speak There are two good interviews with search technologists out there for us to read. John Battelle posted A Frank Interview with Gary Flake yesterday. Battelle introduces Gary Flake as "a veteran of Overture, Yahoo and now Microsoft's vaunted research labs (he's founder and director of the new "Live Labs.")" Also about a month ago, the Yahoo Search Blog posted "A chat with Andrei Broder" Part I, Part II and Part III. Andrei Broder was the VP of research and chief scientist at AltaVista, and is now the Yahoo Research Fellow and Vice President of Emerging Search Technology.Posted by Kevin Heisler at 8:38 AM | Permalink
The second issue of Google's Newsletter for Librarians is now available. It features an article by Karen Schneider, the director of the Librarians' Internet Index, the wonderful and important searchable directory of high quality web resources that I've mentioned on the blog and in SearchDay many times.
Schneider focuses on the some of the critical information judgments needed in determining the trustworthiness of a site and the info that it contains. Those of us who attended library school are aware of many of these concepts. I hope Karen's article reaches more than information professionals including students where these ideas should be taught and reinforced from the earliest grades forward.
Next, Matt "Jagger" Cutts is back with a look at how Google determines what sites are "most trusted." His article talks about the 100's of factors (including some traditional info retrieval metrics) that Google looks at in addition to PageRank.
For more of an in-depth discussion of this you might want to pick up a copy of Chris Sherman's (yes SearchDay's Chris Sherman) book, Google Power. You can preview the title via Amazon's Search Inside the Book. I was unable to find it using Google Book Search.
Remembering that Matt's article was written primarily for librarians and other information professionals, he explains that Google, like other engines analyzes the actual content.
He points out that, "this [analysis] goes beyond scanning page-based text, which webmasters can easily manipulate through meta-tags."
While it's true that Google and other engines look to some degree at the meta-description tag, he doesn't mention that although the meta-keyword tag is still used by some, it's value is not as great as it once was. Danny points this fact out in a 2002 article. You'll also meta tags listed in this post from Barry.
Cutts goes on to write: We also look at factors like fonts and the placement of words on a page. And we examine the content of neighboring pages, which can provide more clues as to whether the page we're looking at is trusted and will be relevant to users.
It would have been useful, particularly to the readers of this article, if Matt would have explained that the factors listed above and many others can also be manipulated or what others have termed "gamed."
As I've pointed out in many presentations to librarian, this is not a good or bad thing but simply the way large general-purpose web enginrs work. For the librarian, a knowledge and understanding of this is important and useful.
After reading both Karen's article and Matt's piece we see somewhat of a disconnect between trustworthiness in terms of inclusion and good placement on a results page versus the trustworthiness concepts that a human might use to judge not only the quality of a web page itself but the data it contains. Yes, I'll readily admit to being a bit prejudice here but I think Karen's article also illustrates the value of just one of the many skills well-trained librarian can offer.
Matt concludes with links to a few more excellent papers.
Btw, many of the same concepts (what Google calls and has patented as PageRank) are in place at just about every other major web engine. In other places, the concept is referred to as link analysis.
As a librarian I would have loved if Matt would have thrown a "shout out" to Dr. Eugene Garfield, the father of citation analysis. It has has been around since the 1950's and librarians have been using it since day one. The relationship between citation analysis (something librarians understand) and link analysis (PageRank) is strong and are even noted in Brin and Page's seminal paper. One of the biggest differences is that web link analysis is much more open than traditional citaton analysis and thereby harder to game (although to some degree) it's also possible.
Yes, the concepts used in citation analysis are really what drive link analysis.
If you want to learn more, this post has tons of links and interviews about citation analysis. It also includes a link to Garfield's paper, Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas."
Finally, although this Scientific American article was written in 1999, I still think it's one of the best, especially for non-geeks, about web link analysis. It was written by members of IBM's Clever team.
Clever was web search engine (never publicly released) by IBM. More about it here. Members of the Clever team read like a "who's who" of web search including Jon Kleinberg, Soumen Chakrabarti, and Prabhakar Raghavan who is now the head of Yahoo Research
As you review the article, take special note of the section where Clever and Google are compared. While Clever never made a public appearance, many of the concepts it offers are what power the Teoma/Ask Jeeves search technology.
Postscript: Yahoo's Prabhakar Raghavan offers archived materials from his Stanford classes on text and information retrieval classes online. Must have content for those interested in the subject.
Posted by Gary Price at 11:58 AM | Permalink
Second Issue of Google's Librarian Newsletter Released and More Interesting Reading on Web SearchThe second issue of Google's Newsletter for Librarians is now available. It features an article by Karen Schneider, the director of the Librarians' Internet Index, the wonderful and important searchable directory of high quality web resources that I've mentioned on the blog and in SearchDay many times.
Schneider focuses on the some of the critical information judgments needed in determining the trustworthiness of a site and the info that it contains. Those of us who attended library school are aware of many of these concepts. I hope Karen's article reaches more than information professionals including students where these ideas should be taught and reinforced from the earliest grades forward.
Next, Matt "Jagger" Cutts is back with a look at how Google determines what sites are "most trusted." His article talks about the 100's of factors (including some traditional info retrieval metrics) that Google looks at in addition to PageRank.
For more of an in-depth discussion of this you might want to pick up a copy of Chris Sherman's (yes SearchDay's Chris Sherman) book, Google Power. You can preview the title via Amazon's Search Inside the Book. I was unable to find it using Google Book Search.
Remembering that Matt's article was written primarily for librarians and other information professionals, he explains that Google, like other engines analyzes the actual content.
He points out that, "this [analysis] goes beyond scanning page-based text, which webmasters can easily manipulate through meta-tags."
While it's true that Google and other engines look to some degree at the meta-description tag, he doesn't mention that although the meta-keyword tag is still used by some, it's value is not as great as it once was. Danny points this fact out in a 2002 article. You'll also meta tags listed in this post from Barry.
Cutts goes on to write: We also look at factors like fonts and the placement of words on a page. And we examine the content of neighboring pages, which can provide more clues as to whether the page we're looking at is trusted and will be relevant to users.
It would have been useful, particularly to the readers of this article, if Matt would have explained that the factors listed above and many others can also be manipulated or what others have termed "gamed."
As I've pointed out in many presentations to librarian, this is not a good or bad thing but simply the way large general-purpose web enginrs work. For the librarian, a knowledge and understanding of this is important and useful.
After reading both Karen's article and Matt's piece we see somewhat of a disconnect between trustworthiness in terms of inclusion and good placement on a results page versus the trustworthiness concepts that a human might use to judge not only the quality of a web page itself but the data it contains. Yes, I'll readily admit to being a bit prejudice here but I think Karen's article also illustrates the value of just one of the many skills well-trained librarian can offer.
Matt concludes with links to a few more excellent papers.
Btw, many of the same concepts (what Google calls and has patented as PageRank) are in place at just about every other major web engine. In other places, the concept is referred to as link analysis.
As a librarian I would have loved if Matt would have thrown a "shout out" to Dr. Eugene Garfield, the father of citation analysis. It has has been around since the 1950's and librarians have been using it since day one. The relationship between citation analysis (something librarians understand) and link analysis (PageRank) is strong and are even noted in Brin and Page's seminal paper. One of the biggest differences is that web link analysis is much more open than traditional citaton analysis and thereby harder to game (although to some degree) it's also possible.
Yes, the concepts used in citation analysis are really what drive link analysis.
If you want to learn more, this post has tons of links and interviews about citation analysis. It also includes a link to Garfield's paper, Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas."
Finally, although this Scientific American article was written in 1999, I still think it's one of the best, especially for non-geeks, about web link analysis. It was written by members of IBM's Clever team.
Clever was web search engine (never publicly released) by IBM. More about it here. Members of the Clever team read like a "who's who" of web search including Jon Kleinberg, Soumen Chakrabarti, and Prabhakar Raghavan who is now the head of Yahoo Research
As you review the article, take special note of the section where Clever and Google are compared. While Clever never made a public appearance, many of the concepts it offers are what power the Teoma/Ask Jeeves search technology.
Postscript: Yahoo's Prabhakar Raghavan offers archived materials from his Stanford classes on text and information retrieval classes online. Must have content for those interested in the subject.
Posted by Kevin Heisler at 11:58 AM | Permalink
Second Issue of Google's Librarian Newsletter Released and More Interesting Reading on Web SearchThe second issue of Google's Newsletter for Librarians is now available. It features an article by Karen Schneider, the director of the Librarians' Internet Index, the wonderful and important searchable directory of high quality web resources that I've mentioned on the blog and in SearchDay many times.
Schneider focuses on the some of the critical information judgments needed in determining the trustworthiness of a site and the info that it contains. Those of us who attended library school are aware of many of these concepts. I hope Karen's article reaches more than information professionals including students where these ideas should be taught and reinforced from the earliest grades forward.
Next, Matt "Jagger" Cutts is back with a look at how Google determines what sites are "most trusted." His article talks about the 100's of factors (including some traditional info retrieval metrics) that Google looks at in addition to PageRank.
For more of an in-depth discussion of this you might want to pick up a copy of Chris Sherman's (yes SearchDay's Chris Sherman) book, Google Power. You can preview the title via Amazon's Search Inside the Book. I was unable to find it using Google Book Search.
Remembering that Matt's article was written primarily for librarians and other information professionals, he explains that Google, like other engines analyzes the actual content.
He points out that, "this [analysis] goes beyond scanning page-based text, which webmasters can easily manipulate through meta-tags."
While it's true that Google and other engines look to some degree at the meta-description tag, he doesn't mention that although the meta-keyword tag is still used by some, it's value is not as great as it once was. Danny points this fact out in a 2002 article. You'll also meta tags listed in this post from Barry.
Cutts goes on to write: We also look at factors like fonts and the placement of words on a page. And we examine the content of neighboring pages, which can provide more clues as to whether the page we're looking at is trusted and will be relevant to users.
It would have been useful, particularly to the readers of this article, if Matt would have explained that the factors listed above and many others can also be manipulated or what others have termed "gamed."
As I've pointed out in many presentations to librarian, this is not a good or bad thing but simply the way large general-purpose web enginrs work. For the librarian, a knowledge and understanding of this is important and useful.
After reading both Karen's article and Matt's piece we see somewhat of a disconnect between trustworthiness in terms of inclusion and good placement on a results page versus the trustworthiness concepts that a human might use to judge not only the quality of a web page itself but the data it contains. Yes, I'll readily admit to being a bit prejudice here but I think Karen's article also illustrates the value of just one of the many skills well-trained librarian can offer.
Matt concludes with links to a few more excellent papers.
Btw, many of the same concepts (what Google calls and has patented as PageRank) are in place at just about every other major web engine. In other places, the concept is referred to as link analysis.
As a librarian I would have loved if Matt would have thrown a "shout out" to Dr. Eugene Garfield, the father of citation analysis. It has has been around since the 1950's and librarians have been using it since day one. The relationship between citation analysis (something librarians understand) and link analysis (PageRank) is strong and are even noted in Brin and Page's seminal paper. One of the biggest differences is that web link analysis is much more open than traditional citaton analysis and thereby harder to game (although to some degree) it's also possible.
Yes, the concepts used in citation analysis are really what drive link analysis.
If you want to learn more, this post has tons of links and interviews about citation analysis. It also includes a link to Garfield's paper, Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas."
Finally, although this Scientific American article was written in 1999, I still think it's one of the best, especially for non-geeks, about web link analysis. It was written by members of IBM's Clever team.
Clever was web search engine (never publicly released) by IBM. More about it here. Members of the Clever team read like a "who's who" of web search including Jon Kleinberg, Soumen Chakrabarti, and Prabhakar Raghavan who is now the head of Yahoo Research
As you review the article, take special note of the section where Clever and Google are compared. While Clever never made a public appearance, many of the concepts it offers are what power the Teoma/Ask Jeeves search technology.
Postscript: Yahoo's Prabhakar Raghavan offers archived materials from his Stanford classes on text and information retrieval classes online. Must have content for those interested in the subject.
Posted by Kevin Heisler at 11:58 AM | Permalink
Second Issue of Google's Librarian Newsletter Released and More Interesting Reading on Web SearchThe second issue of Google's Newsletter for Librarians is now available. It features an article by Karen Schneider, the director of the Librarians' Internet Index, the wonderful and important searchable directory of high quality web resources that I've mentioned on the blog and in SearchDay many times.
Schneider focuses on the some of the critical information judgments needed in determining the trustworthiness of a site and the info that it contains. Those of us who attended library school are aware of many of these concepts. I hope Karen's article reaches more than information professionals including students where these ideas should be taught and reinforced from the earliest grades forward.
Next, Matt "Jagger" Cutts is back with a look at how Google determines what sites are "most trusted." His article talks about the 100's of factors (including some traditional info retrieval metrics) that Google looks at in addition to PageRank.
For more of an in-depth discussion of this you might want to pick up a copy of Chris Sherman's (yes SearchDay's Chris Sherman) book, Google Power. You can preview the title via Amazon's Search Inside the Book. I was unable to find it using Google Book Search.
Remembering that Matt's article was written primarily for librarians and other information professionals, he explains that Google, like other engines analyzes the actual content.
He points out that, "this [analysis] goes beyond scanning page-based text, which webmasters can easily manipulate through meta-tags."
While it's true that Google and other engines look to some degree at the meta-description tag, he doesn't mention that although the meta-keyword tag is still used by some, it's value is not as great as it once was. Danny points this fact out in a 2002 article. You'll also meta tags listed in this post from Barry.
Cutts goes on to write: We also look at factors like fonts and the placement of words on a page. And we examine the content of neighboring pages, which can provide more clues as to whether the page we're looking at is trusted and will be relevant to users.
It would have been useful, particularly to the readers of this article, if Matt would have explained that the factors listed above and many others can also be manipulated or what others have termed "gamed."
As I've pointed out in many presentations to librarian, this is not a good or bad thing but simply the way large general-purpose web enginrs work. For the librarian, a knowledge and understanding of this is important and useful.
After reading both Karen's article and Matt's piece we see somewhat of a disconnect between trustworthiness in terms of inclusion and good placement on a results page versus the trustworthiness concepts that a human might use to judge not only the quality of a web page itself but the data it contains. Yes, I'll readily admit to being a bit prejudice here but I think Karen's article also illustrates the value of just one of the many skills well-trained librarian can offer.
Matt concludes with links to a few more excellent papers.
Btw, many of the same concepts (what Google calls and has patented as PageRank) are in place at just about every other major web engine. In other places, the concept is referred to as link analysis.
As a librarian I would have loved if Matt would have thrown a "shout out" to Dr. Eugene Garfield, the father of citation analysis. It has has been around since the 1950's and librarians have been using it since day one. The relationship between citation analysis (something librarians understand) and link analysis (PageRank) is strong and are even noted in Brin and Page's seminal paper. One of the biggest differences is that web link analysis is much more open than traditional citaton analysis and thereby harder to game (although to some degree) it's also possible.
Yes, the concepts used in citation analysis are really what drive link analysis.
If you want to learn more, this post has tons of links and interviews about citation analysis. It also includes a link to Garfield's paper, Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas."
Finally, although this Scientific American article was written in 1999, I still think it's one of the best, especially for non-geeks, about web link analysis. It was written by members of IBM's Clever team.
Clever was web search engine (never publicly released) by IBM. More about it here. Members of the Clever team read like a "who's who" of web search including Jon Kleinberg, Soumen Chakrabarti, and Prabhakar Raghavan who is now the head of Yahoo Research
As you review the article, take special note of the section where Clever and Google are compared. While Clever never made a public appearance, many of the concepts it offers are what power the Teoma/Ask Jeeves search technology.
Postscript: Yahoo's Prabhakar Raghavan offers archived materials from his Stanford classes on text and information retrieval classes online. Must have content for those interested in the subject.
Posted by Kevin Heisler at 11:58 AM | Permalink
In October, Danny blogged about Google's "coming soon" quarterly newsletter for librarians. Today, the first issue went live. It's available here.
I've posted a bit more on ResourceShelf.
The highlight of this issue is the Matt Cutts authored article on how Google crawls content and ranks results, with a very nice explanation of how an inverted index works. For some librarians and many readers of this blog, it will be familiar material. Neverthless, when Mr. Cutts writes, it's always a great read and in this case an excellent review.
Posted by Gary Price at 5:36 PM | Permalink
First Issue Of Google's Newsletter For Librarians Released; Cutts Writes On Inverted Index & RankingIn October, Danny blogged about Google's "coming soon" quarterly newsletter for librarians. Today, the first issue went live. It's available here.
I've posted a bit more on ResourceShelf.
The highlight of this issue is the Matt Cutts authored article on how Google crawls content and ranks results, with a very nice explanation of how an inverted index works. For some librarians and many readers of this blog, it will be familiar material. Neverthless, when Mr. Cutts writes, it's always a great read and in this case an excellent review.
Posted by Kevin Heisler at 5:36 PM | Permalink
First Issue Of Google's Newsletter For Librarians Released; Cutts Writes On Inverted Index & RankingIn October, Danny blogged about Google's "coming soon" quarterly newsletter for librarians. Today, the first issue went live. It's available here.
I've posted a bit more on ResourceShelf.
The highlight of this issue is the Matt Cutts authored article on how Google crawls content and ranks results, with a very nice explanation of how an inverted index works. For some librarians and many readers of this blog, it will be familiar material. Neverthless, when Mr. Cutts writes, it's always a great read and in this case an excellent review.
Posted by Kevin Heisler at 5:36 PM | Permalink
First Issue Of Google's Newsletter For Librarians Released; Cutts Writes On Inverted Index & RankingIn October, Danny blogged about Google's "coming soon" quarterly newsletter for librarians. Today, the first issue went live. It's available here.
I've posted a bit more on ResourceShelf.
The highlight of this issue is the Matt Cutts authored article on how Google crawls content and ranks results, with a very nice explanation of how an inverted index works. For some librarians and many readers of this blog, it will be familiar material. Neverthless, when Mr. Cutts writes, it's always a great read and in this case an excellent review.
Posted by Kevin Heisler at 5:36 PM | Permalink
News.com has a nice mention of long-time search watcher Stephen Arnold having compiled more than 120 patents he believes belong to Google on a CD. Want to get them in one go? Visit his site, pay your $50, and there you go. Gary, of course, regularly posts here about patents and links to where you can download them for free (use that Legal: Patents link below this post if you are an SEW member for a fast way to see his past posts). But if you want to save yourself some time and love reading patents, this looks like an easy way to go.
Posted by Danny Sullivan at 8:26 AM | Permalink
Get Your Google Patents On CDNews.com has a nice mention of long-time search watcher Stephen Arnold having compiled more than 120 patents he believes belong to Google on a CD. Want to get them in one go? Visit his site, pay your $50, and there you go. Gary, of course, regularly posts here about patents and links to where you can download them for free (use that Legal: Patents link below this post if you are an SEW member for a fast way to see his past posts). But if you want to save yourself some time and love reading patents, this looks like an easy way to go.
Posted by Kevin Heisler at 8:26 AM | Permalink
Get Your Google Patents On CDNews.com has a nice mention of long-time search watcher Stephen Arnold having compiled more than 120 patents he believes belong to Google on a CD. Want to get them in one go? Visit his site, pay your $50, and there you go. Gary, of course, regularly posts here about patents and links to where you can download them for free (use that Legal: Patents link below this post if you are an SEW member for a fast way to see his past posts). But if you want to save yourself some time and love reading patents, this looks like an easy way to go.
Posted by Kevin Heisler at 8:26 AM | Permalink
Get Your Google Patents On CDNews.com has a nice mention of long-time search watcher Stephen Arnold having compiled more than 120 patents he believes belong to Google on a CD. Want to get them in one go? Visit his site, pay your $50, and there you go. Gary, of course, regularly posts here about patents and links to where you can download them for free (use that Legal: Patents link below this post if you are an SEW member for a fast way to see his past posts). But if you want to save yourself some time and love reading patents, this looks like an easy way to go.
Posted by Kevin Heisler at 8:26 AM | Permalink
Convera, a well-known name in enterprise search technology, posted a bit of news today that their web index continues to grow. So? It's been reported that Convera will enter the public web search space with a release of a web search tool by the end of this year. The company also announced that they've added 100 million images to their web index. Convera recently announced that they've just licensed use of their web database by an undisclosed U.S. Government organization.
Posted by Gary Price at 4:51 PM | Permalink
Convera's Web Index ExpandsConvera, a well-known name in enterprise search technology, posted a bit of news today that their web index continues to grow. So? It's been reported that Convera will enter the public web search space with a release of a web search tool by the end of this year. The company also announced that they've added 100 million images to their web index. Convera recently announced that they've just licensed use of their web database by an undisclosed U.S. Government organization.
Posted by Kevin Heisler at 4:51 PM | Permalink
Convera's Web Index ExpandsConvera, a well-known name in enterprise search technology, posted a bit of news today that their web index continues to grow. So? It's been reported that Convera will enter the public web search space with a release of a web search tool by the end of this year. The company also announced that they've added 100 million images to their web index. Convera recently announced that they've just licensed use of their web database by an undisclosed U.S. Government organization.
Posted by Kevin Heisler at 4:51 PM | Permalink
Convera's Web Index ExpandsConvera, a well-known name in enterprise search technology, posted a bit of news today that their web index continues to grow. So? It's been reported that Convera will enter the public web search space with a release of a web search tool by the end of this year. The company also announced that they've added 100 million images to their web index. Convera recently announced that they've just licensed use of their web database by an undisclosed U.S. Government organization.
Posted by Kevin Heisler at 4:51 PM | Permalink
If you're in need of a couple of roundups that look at various search tools and services, two of them are online today. One in Time magazine and the other in The Boston Globe. It's likely that the services and companies mentioned in these articles will be "new" to many readers of these publications. However, we've discussed and linked to many of them on the SEW Blog during the past year.
Time magazine's: On the Frontier of Search, and the Boston Globe article: Cutting through search-engine clutter, include mentions of:
Time NOTE: I was interviewed by a Time reporter for this story.
Boston Globe
Quotes "Search will ultimately be as good as having 1,000 human experts who know your tastes scanning billions of documents within a split second. It will model the human brain." Gary Flake, Distinguished Engineers at Microsoft Note: Dr. Flake is the former head of Yahoo Research Labs. Here's an interview that I did with Dr. Flake in 2004.
Posted by Gary Price at 10:54 AM | Permalink
Two Roundups of New Search Technology and Services AvailableIf you're in need of a couple of roundups that look at various search tools and services, two of them are online today. One in Time magazine and the other in The Boston Globe. It's likely that the services and companies mentioned in these articles will be "new" to many readers of these publications. However, we've discussed and linked to many of them on the SEW Blog during the past year.
Time magazine's: On the Frontier of Search, and the Boston Globe article: Cutting through search-engine clutter, include mentions of:
Time NOTE: I was interviewed by a Time reporter for this story.
Boston Globe
Quotes "Search will ultimately be as good as having 1,000 human experts who know your tastes scanning billions of documents within a split second. It will model the human brain." Gary Flake, Distinguished Engineers at Microsoft Note: Dr. Flake is the former head of Yahoo Research Labs. Here's an interview that I did with Dr. Flake in 2004.
Posted by Kevin Heisler at 10:54 AM | Permalink
Two Roundups of New Search Technology and Services AvailableIf you're in need of a couple of roundups that look at various search tools and services, two of them are online today. One in Time magazine and the other in The Boston Globe. It's likely that the services and companies mentioned in these articles will be "new" to many readers of these publications. However, we've discussed and linked to many of them on the SEW Blog during the past year.
Time magazine's: On the Frontier of Search, and the Boston Globe article: Cutting through search-engine clutter, include mentions of:
Time NOTE: I was interviewed by a Time reporter for this story.
Boston Globe
Quotes "Search will ultimately be as good as having 1,000 human experts who know your tastes scanning billions of documents within a split second. It will model the human brain." Gary Flake, Distinguished Engineers at Microsoft Note: Dr. Flake is the former head of Yahoo Research Labs. Here's an interview that I did with Dr. Flake in 2004.
Posted by Kevin Heisler at 10:54 AM | Permalink
Two Roundups of New Search Technology and Services AvailableIf you're in need of a couple of roundups that look at various search tools and services, two of them are online today. One in Time magazine and the other in The Boston Globe. It's likely that the services and companies mentioned in these articles will be "new" to many readers of these publications. However, we've discussed and linked to many of them on the SEW Blog during the past year.
Time magazine's: On the Frontier of Search, and the Boston Globe article: Cutting through search-engine clutter, include mentions of:
Time NOTE: I was interviewed by a Time reporter for this story.
Boston Globe
Quotes "Search will ultimately be as good as having 1,000 human experts who know y