Matthew McDermott

Matthew is a principal consultant for Catapult Systems. A Microsoft SharePoint Server MVP, Matthew blogs about SharePoint and Microsoft technologies related to collaboration, web content management and productivity.
RSS Feed

SharePoint Image Search (Part 1)

Introduction

After working on several search projects it is clear that the out of the box search experience for images is, well, less than optimal. I mean look at how cool the Microsoft Live Search Images interface is, then, hover over an image and look at how cool the property display is!

How about the results in SharePoint…

OK, I know, SharePoint is a document centric application and it is built for the indexing, searching and display of documents. I agree! What if your company's documents ARE images, no matter where you store them? You may use file shares, SharePoint Document libraries, SharePoint picture libraries, and SharePoint Publishing Images libraries.

Why I love SharePoint

If I had to sum up why I love SharePoint, it is this: SharePoint is very flexible. So you can't do GREAT image search out of the box…so what! The designers at Microsoft build the search capability with two important concepts in mind:

  1. Today we don't know what folks will want to crawl tomorrow, so make it flexible enough to configure the crawls for other file types.
  2. Today we don't know how our users want their result displayed tomorrow, so make the search interface flexible enough to render results any way the customer wants.

A Series of Posts

It is with these two concepts in mind that I set out to teach SharePoint to do image search better than it does out of the box. This is the first post in a series where I'll address the configuration areas required to improve the out of the box handling of image files (JPEG, GIF, PNG, etc.). My goal with this series of posts is to present a structured process for addressing a search project of this nature and present a solution that I think most folks can understand.

The three parts are:

  1. Index Engine Configuration – Can the indexer find your images?
  2. Property Configuration – Are you getting the metadata?
  3. Search Results Configuration – Does search render results?
  4. More Search Results – Do my results look great?

Environment Setup

We will start with a test environment with images loaded in 4 locations; File Share, SharePoint Picture Library, SharePoint Document Library, and SharePoint Publishing Images Library. Images are handled differently by SharePoint depending on their location. I use:

  • The same images in each location
  • exactly 10 images of each type in each location (this makes testing much easier)
  • three different file types: JPEG, GIF and PNG

The environment is MOSS Enterprise SP1 with the Infrastructure Update applied. The screen shots you will see may differ from your environment if you are still working with an SP1 environment but the concepts are the same. I am using a Search Center with Tabs. I configure a tab with XML results based on this post. This configuration makes life easy for testing.

Content Source Configuration

To make things easy I am going to configure two Content Sources, one for our test intranet site and one for the file share. Since SharePoint provides these protocol handlers out of the box, there is nothing special to do here besides set them up. For testing this makes it easy to re-crawl only the target of your tests. Later you can consolidate the content sources if you like.

First Crawl Results

My first crawl results are unimpressive, but that was to be expected. Let's look at what we actually found. One of my images is Amelia.jpg. If I search for "Amelia" I get 7 results, 4 are the list, libraries and pages related to the lists, 3 are the actual list items, none are the actual files and the file share images are not present.

I execute the following property search: fileextension:jpg (or fileextension:gif or fileextension:jpg) and see the following results…

The result set contains only 10 images, but there should be 40, 10 GIF images times 4 different indexed locations. What's the deal? The results are the same for all three extensions. The only images we are finding with a recognized file extension are those from the Picture Library.

Hey! Where are my files?

Out of the box SharePoint does not include JPG, GIF and PNG in the crawl. So the files on the file share are excluded, but since the files are also stored in SharePoint libraries, the list items are included in the index. Looking at the returned XML answers the question. How do I view my results as XML?

The results are of Content Class STS_ListItem_PictureLibrary, the index is returning the list item, not the image. Looking at the crawl logs for each content source confirms this. The file share returned no files because the crawled File Types list does not include JPG, PNG, or GIF.

The document and publishing images library returned no images, just list items.

File Types

The list of File Types determines what files to include in the index. If the extension is not listed here the file will not be crawled. So click the link and add JPG, GIF and PNG.

Second Crawl

Closer! We got all of our images from all of our content sources.

The interesting thing is that our property search fileextension:jpg only returns 20 items, 10 from the image library and 10 from the file share. What about the document libraries? The out of the box document libraries do not return an attribute for that maps to fileextention. Picture Libraries return "File Type" which is mapped to FileExtension. Likewise, PictureWidth is not returned from the file share or the document library, but is returned from the Picture Library and Publishing Images library.

OK…Hey! Where are my properties?

Next time!

 

Posted by Matthew McDermott on Thursday, 28 Aug 2008 09:24
15 Comments | Filed under: Search, Web Publishing
Bookmark this post with:        

Comments

On 29 Aug 2008 11:36, Eric said:

Funny, I am needing a solution for this right now. Looking forward to the next installments...please hurry! :)

On 01 Sep 2008 05:10, Matthew said:

Well Eric, all 4 parts are posted now. I hope they help you!

On 07 Sep 2008 11:29, Harish Mathanan said:

Matthew, can't tell you how grateful I am with this 4 part article. Stuff like this makes slogging through blogs worth it :)

On 09 Oct 2008 04:22, Brian Grabowski said:

Good stuff! Can anyone confirm that documents, files, etc., must be checked in to be indexed and ultimately appear in search results? -Brian

On 13 Oct 2008 07:33, Matthew said:

Brian, If your crawl account has read access to the library (standard, recommended configuration) then, yes, you have to have the document checked in for the changes to show in search results.

On 29 Oct 2008 08:21, Adrian DeFazio said:

I've tried your suggestion above about including the file types (GIF, PNG, JPG) in our environments search settings. When I perform a full crawl on our site, and then perform a search, I'm still only getting the list items. Any ideas on why my search isn't returning the images?

On 10 Feb 2009 05:11, Amreesh Sharma said:

Hi i have a problem related to search. I have 3 different sites under one site collection and i have created a scope of all document libraries in all the sites. I want to see frequently serched documents and recently searched documents separately. Please suggest a solution. Your help will be highly appreciated. Regards Amreesh Sharma

On 11 Feb 2009 09:46, Matthew said:

Amreesh, I would take a look as surfacing the query logs. There is an administrative interface in the API. (Or you can ask the question in the forums.

On 17 Feb 2009 11:27, Jeff said:

Thanks for this tip. However, how do I get fileshare results to open in the appropriate application instead of Internet Explorer?

On 18 Feb 2009 07:17, Matthew said:

Jeff, That is a browser setting. Or you can right click the file and choose to open it. It has nothing to do with SharePoint. What file type?

On 19 Feb 2009 09:22, Tim said:

Great stuff! Thanks a lot for this post, Matthew, it was really really helpful. Good explanation and the xsl code saved me loads of time not programming it ;) Great

On 06 Apr 2009 09:29, sye said:

Hi Thanks for the posts! Btw, if the images all stored in the Document libraries(I name it as Image Libraris). So mean file extension method for GIF,JPG would not map from Doc libraries? Any methods can map the GIF.JPG extension from Doc Libraries? Thanks and appreciated.

On 07 Apr 2009 05:07, Matthew said:

sye, the fileextensions from document libraries should index in the fileextension property. Make sure that you are indexing those file extensions in the File Types setting.

On 16 Jun 2009 09:41, Sixpointsteve said:

NIce Article, hope i can get teh code right! Do I have to use an Ifilter or can we use the metadata associated with the Sharepoint Image Library? Sorry, haven't read all four parts...

On 19 Jun 2009 08:21, Matthew said:

Six, It depends on what metadata you want. If you just need the basics that are provided by the Image Library than you don't need an iFilter.

Leave a comment

Name (required)

Url

Email

Comments

Complete this section to post your comment