Enhancing Google Search With ImageExchange

Posted on 12/2/2010 by Jim Pickerell | Printable Version | Comments (2)

At first glance, PicScout’s new ImageExchange interface that isolates images that are easily licensable from any Google or Yahoo! search, and displays them in a right-hand panel next to all the returns delivered by these search engines, would seem to be a very helpful tool for professional users looking for images they can license legitimately. In fact, the returns delivered may be more misleading than useful.

There is a major problem that forces ImageExchange to miss hundreds and maybe thousands of professionally produced images that have been used online and which IE has “fingerprinted.” The problem is not with ImageExchange, but with the way Google and Yahoo! deliver returns from the massive numbers of unique images they have located on the web.

Regardless, of how many images are found the search engines will only show about 1,000 on the theory that no customer will look at more than 1,000 images. When a customer searches by keyword for an image, Google’s software asks more than 200 questions about the page where that image is found. It wants to know how many times the keyword appear on the page that contains the images, whether the keyword is in the title or the URL, whether a synonym for the word is included, whether the page is from a “quality” web site and more. It also determines the page rank, or importance, by looking at how many outside pages point to it. Yahoo! uses similar techniques.

All of these things are important if the searcher is looking for textual information, but to a great degree these questions have little, if anything, to do with the quality of the image on the page. Often the information on the page is in high demand and of great importance, but the picture is generic and not tightly related to the text. Nevertheless, the picture gets ranked based on the textual information that is associated with it, not on its own value because there is no easy way to measure visual or creative value.

Let me look at a few searches to better illustrate what happens.

On Google I searched for “apple fruit” (to avoid the computer company) and learned that Google has indexed 8,040,000 images. After 3 minutes ImageExchange found and displayed 17 images that can be easily and quickly licensed. It indicates whether the image is RF or RM and by clicking on the image the searcher can get immediate access to the web site where the image can be licensed. Does that mean that of the more than 8 million images of apples on the Internet only 17 of them belong to image creators who expect to be paid when their images are used? Absolutely not!

The flaw in all this is that these 17 are from the 1,000 Google has chosen to display. There is no way for the customer to see any of the other 8,039,000 images or determine if any of them can be legally licensed as well. Some of these other images might be appropriate for the image searcher’s needs, but as far as the person doing the search can tell there are only 17 images in all the over 8 million that are capable of being legally licensed. This leads the customer to a few false conclusions:

Only a miniscule number of images available on the Internet need to be licensed (17 out of 8,040,000).
Any other professionally produced image will be of poorer quality, or less relevance to the search, than the hundreds of other images – mostly produced by amateurs – that Google shows.
That the quality and the appropriateness of images produced by professionals are not nearly as good as amateur snapshots because so few of them can be found in Google’s selected 1,000.

Google’s theory is that customers should be able to use more keywords to narrow their search. Though that may work with textual information, it doesn’t work nearly as well with photos. Most photos on the Internet have not been properly keyworded or captioned. I searched for “apple fruit red” and Google says it has 4,730,000 images. After 3 minutes ImageExchange showed 22 out of the 1,000 Google was willing to show us. Some of the red ones that appeared in the first search didn’t appear in this search.

To make it really hard I searched for “apple fruit red Vietnamese”. There are 512,000 hits with that search. After three minutes ImageExchange found one image from Dreamstimes that could be licensed.

“Race horse California” had 1,740,000 hits and after 3 minutes ImageExchange found 3 images that needed to be licensed. Within this group of 1,000 there were several versions of Eadweard Muybridge’s famous stop action series of a horse in motion. One could be licensed from the Life Magazine archives, but a number were available from other sources. The implication is that if it is not found by ImageExchange then it is probably free for the taking. It is interesting that a search for “Eadweard Muybridge horse in motion” delivers the first 1,000 of 6,380 results. Since most of them must be available for free why would anyone pay Life Magazine to use their image?

Could the search engines do better?

Easily! ImageExchange could be very useful if there were some way for it to compare its entire database (currently over 50 million images) with the entire Google database on any given subject and then deliver up to 1,000 images all of which would be available for licensing. This would be of tremendous value to the professional user.

Based on these few searches, and assuming there are a proportionate number of images available for licensing in each 1,000 that Google has indexed, it might require searching through 58,000 of the over 8 million “apple fruit” images in order to find 1,000 returns available for licensing. While this is still a very small percentage of the total images Google has found it is certainly a much more reasonable sample for customers to consider than a random 1,000. Such a strategy would also provide customers with a reasonable choice of “professionally produced” images. With “apple fruit red” the database would need to be compared with 45,000 images. When we get to “apple fruit red Vietnamese” it is entirely possible that IE could compare its database to every image in the Google collection and still not find 1,000 returns. With “race horse California” IE might have to go through 333,000 before it could produce 1,000 licensable images to review.

An important thing to note here is how inadequate Google or Yahoo! image searches really are. The perception is that Google has this fantastic algorithm that will somehow be able to pick out the best, most appropriate 1,000 images from 8 million or more. Anyone who does a cursory examination of 1,000 images delivered in any of these returns – and I suspect in most other returns – would have to acknowledge that they have seen much better, more appropriate images on the topic being searched on web sites they have visited. When doing the search for “apple fruit red Vietnamese” in addition to apples Google returned pictures of tomatoes, carrots, strawberries, raspberries, peaches, pears, lots of family vacation pictures, men by their fruit trees in Ghana and Ayers Rock to list only a few of the inappropriate images. If Google searched for images fingerprinted by ImageExchange at least when someone asked for apples they might get pictures of apples and not all the totally unrelated imagery that Google is delivering.

All search engines would have to do is offer their customers a choice of either searching for images that are easily licensable or for those where no information about licensing is available. These categories could be defined as for Professional and Non-Professional users. All the images in the Professional collection would require licensing of some type. When someone searches the Non-Professional collection they would not be shown any of the images in the Professional collection. That could result in a lot less unauthorized use of copyrighted images. Such a strategy wouldn’t prevent those who believe that everything on the Internet should be free from searching the professional category and then using images without negotiating rights to use them, but at least they would be more aware that they are stealing.

To find the easily licensable images the search engines would need to work out some arrangement to use the ImageExchange database. If they would match the IE database to all the images they have found, not just a random 1,000 images, they would have a much more useful tool. Such an option would better serve professional image users who want to legally license rights to the images they use, image creators who care about licensing their images and the search engines that choose to adopt the strategy

However, I won’t hold my breath until Google, Yahoo!, Bing and others make some effort to improve their searches.

Copyright © Jim Pickerell. The above article may not be copied, reproduced, excerpted or distributed in any manner without written permission from the author. All requests should be submitted to Selling Stock at 10319 Westlake Drive, Suite 162, Bethesda, MD 20817, phone 301-461-7627, e-mail: wvz@fpcubgbf.pbz

Jim Pickerell is founder of www.selling-stock.com, an online newsletter that publishes daily. He is also available for personal telephone consultations on pricing and other matters related to stock photography. He occasionally acts as an expert witness on matters related to stock photography. For his current curriculum vitae go to: http://www.jimpickerell.com/Curriculum-Vitae.aspx.

Comments

John Harris Posted Dec 3, 2010
Thank you for this interesting article. Those who's business model is to do no more than aggregate content may find they are undermined if these issues were resolved...
Offir Gutelzon Posted Dec 3, 2010
Jim – Great topic to drive stock industry discussion and an important topic to engage the broader ecosystem partners, such as Google, Yahoo and Bing. You mentioned, the problem is not with (PicScout’s) ImageExchange – rather it’s with the way Google and Yahoo deliver (image) returns. We believe the responsibility resides with both the search engines and the stock agencies.

Google has admitted that their user experience is not what it could or should be, and their users have clearly stated they want to see higher quality images. For stock agencies, embracing the power of technology and SEO can increase their image rankings, and participation in ImageExchange can both increase image sales and decrease unauthorized use by providing connection regardless where the search engine found the image.

We believe the future opportunity is to deliver against the promise of Every Image Gets Its Credit for licensors and Instant Image ID for content users. We invite your readers to learn more about how the industry can capitalize and monetize images on our blog http://blog.picscout.com/