Faceted Search with Bobo

Introduction

Who hasn't had a client say "build me a search function like eBay (or Amazon, Ebuyer etc.)"?

Usually the client has no idea of the scale of the request they've just made but as we'll find out, Bobo does a pretty good job of handling it all.

Why facet with Bobo?

Good question! The core Lucene project (that powers Examine) comes with built-in faceting but that's written in Java. The .net port is a little behind the times (version 3.0.3 to be precise and we don't see true Faceting until version 3.4 or so)

Examine also has its own implementation of faceting but it's in an experimental branch and so requires re-compiling Examine (and potentially keeping it up to date with future Umbraco releases)

As we'll see, Bobo is not just a Faceting engine. It basically provides everything you need to create a browse-driven web experience.

In their own words:

"Bobo Browse is an information retrieval technology that provides navigational browsing into a semi-structured dataset. Beyond the result set from queries and selections, Bobo Browse also provides the facets from this point of browsing."

Features:

  1. No need for cache warm-up for the system to perform: Good for low-traffic sites
  2. multi value sort - sort documents on fields that have multiple values per doc, .e.g tokenized fields: This is only useful if you implement your own LuceneIndexer.DocumentWriting Event Handler.
  3. fast field value retrieval: You can fetch the search results directly from Bobo, avoiding an Umbraco Content lookup if you need maximum speed (I don't personally do this - assuming you've implemented pagination Umbraco will barely feel a tickle)
  4. facet count distribution analysis:  Erm? Answers on a postcard?
  5. stable and small memory footprint: Good.
  6. support for runtime faceting: See point 4.

As an aside: A discussion of faceting wouldn't be complete without touching upon Solr.  Similar to Bobo, Solr imposes a structure on your data via a Schema and allows you to search sort and facet all kinds of document-orientated data. But this requires a not-insignificant investment of time to keep the Solr index in-sync with your Umbraco content - something that Examine handles for free.

Build Bobo

Now this may look like a lot of effort, but stick with me, it'll be worth it in the end.

Until Umbraco Examine is updated to use version 3.0.3 of Lucene, we need to use the old, and slightly un-loved, version of Bobo. This requires a little work but I've detailed the steps below:

Download a zip of the source code. Don't open the solution in visual studio yet!  We need to replace the log4net.dll in \DllReferences with the one from the Umbraco distribution first. This done, open the solution in visual studio (I used vs2010, if you have a newer version it will likely want to update the project) then right-click properties on each of the three projects in the solution in turn (BoboBrowse.Net, BoboBrowse.Tests & LuceneExt.Net) and change:

  1. The Target Framework to (at least) version 4 (not client profile)
  2. Uncheck the "Sign the assembly" check box on the "Signing" tab - The version of log4net that ships with Umbraco isn't signed so in turn, Bobo can't be signed.

Finally, build the solution and you will have a shiny BoboBrowse.Net.dll (& friends) in the \Deployment folder.

Copy the contents of this folder to the \bin folder of Umbraco and reload the admin area.

The Demo

I've put together a little demo based on the TXT starter Kit for Umbraco (this demo was written for u7.1.9 but the version of Lucene shipped with Umbraco has been stable for ages so the core concepts will work on any version of Umbraco 4.7+)

Let's start with adding some content that we can facet on. I've added a property to the News Item Document Type called Category:

Add Property

And then populated each news article with some sample categories:

Populate Property

What we're aiming for is a search page with a keyword search plus our Category facet a little like this:

Noquery

I sincerely hope yours will be prettier!

The Code

Most of the following is just a re-spun version of the usage sample from the projects' home page at https://bobo.codeplex.com/ with a sprinkling of configuration from my last project.

Let's start by building a BrowseRequest.

// creating a browse request
var browseRequest = new BrowseRequest
{
    Count = 10, // Page size
    Offset = 0, // Page size * Page Number
    FetchStoredFields = true, // Fetch data from stored fields
    Sort = new[] { new SortField("updateDate", 3, true) }
};

As you can see, Bobo handles all the logic efficiently paging through your result set, all you need to do it specify a page size and build a pagination UI (Not covered in this demo)

Next, we'll add in the keyword search provided by the user. This will be interpreted as a raw Lucene Query so you may want to sanitise the user's input a little. Here's one I created earlier.

// parse a query
var query = Request["q"];
if(!string.IsNullOrEmpty(query)) {
    var parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "bodyText", new Lucene.Net.Analysis.KeywordAnalyzer());
    Query q = parser.Parse(query);
    browseRequest.Query = q;
}

Here's where I deviate from the Bobo sample usage - Each facet requires two entities in Bobo, a FacetHandler and a FacetSpec. I chose to initialise them both at the same time as follows:

ICollection<FacetHandler> handlerList = new List<FacetHandler>();

// define the facet output spec used for all facets
var facetSpec = new FacetSpec { OrderBy = FacetSpec.FacetSortSpec.OrderHitsDesc, ExpandSelection = true };

// Add a facet
var fieldName = "category";
handlerList.Add(new SimpleFacetHandler(fieldName));
browseRequest.SetFacetSpec(fieldName, facetSpec);

The chosen FacetHandler specifies the general behaviour of the Facet. For example you can use a RangeFacetHandler() to facet on pre-defined ranges e.g. price ranges or date ranges. Here we're using the SimpleFacetHandler which expects a single value per document (a category in this example). If you were able to select multiple categories then you would need to upgrade to the MultiValueFacetHandler.

Now we stick it all together by:

  1. Grabbing an Examine Searcher for the built-in 'External' index.
  2. Then grabbing the underlying Lucene IndexSearcher.
  3. Extracting the IndexReader which is what Bobo will operate on. 
  4. Wrapping the vanilla IndexReader with a BoboIndexReader.
  5. Executing the browse request we've built up.
var searchProvider = ExamineManager.Instance.SearchProviderCollection["ExternalSearcher"] as LuceneSearcher;
var searcher = (IndexSearcher) searchProvider.GetSearcher();
var reader = searcher.GetIndexReader();

// decorate lucene reader with a bobo index reader
BoboIndexReader boboReader = BoboIndexReader.GetInstance(reader, handlerList);

// perform browse
IBrowsable browser = new BoboBrowser(boboReader);
var results = browser.Browse(browseRequest);

This will result in a list of Lucene documents that matches your query.

As lucene documents consist of Key-Value pairs of string data, they're not the nicest things to work with, especially when we have the awesome Umbraco at our fingertips. So the following step will extract the NodeIds of the matching documents (only for the current page if there are a lot of matches) and fetch lovely Umbraco Dynamic content items for us to razor all over!

// create collection of Umbraco NodeIds from Hits.
var resultNodeIds = results.Hits.Select(x => x.StoredFields.Get("id")).ToList();

HTML

I suspect most people reading this will have a fair idea of how they want their search results to look so I'll keep this brief.

<form method="post">
    <div class="row">
        <div class="9u">
            <label for="q">Search</label>
            <input name="q" id="q" value="@Request["q"]" />
            <input type="submit" value="Search" />
        </div>
    </div>

    <div class="row">
        <div class="3u">
            @foreach (var facet in results.FacetMap)
            {
                <h4>@facet.Key</h4>
                <ul>
                    @foreach (var facetValue in facet.Value.GetFacets())
                    {
                        var chck = (Request["facet_" + facet.Key] ?? string.Empty).Contains(@facetValue.Value.ToString()) ? "selected" : null;
                        <li><label><input type="checkbox" name="facet_@facet.Key" value="@facetValue.Value" checked="@chck" />@facetValue.Value</label>: @facetValue.HitCount</li>
                    }
                </ul>
            }
            <input type="submit" value="Update" />
        </div>
        <div class="6u">
            <ul>
            @foreach (var resultNodeId in resultNodeIds)
            {
                var node = Umbraco.Content(resultNodeId);
                <li><a href="@node.Url">@node.Name</a></li>
            }
            </ul>
        </div>
    </div>
</form>

In this snippet, I've:

  1. Iterated over the available facets, generating a checkbox for each and then,
  2. Iterated over the search results, outputting a link to the page.

It isn't pretty, it's not even clever. OK, there's no need to laugh!

Searching the demo for keyword 'Lorem'
Searching the demo for keyword 'Lorem'

Filtering

Now that we have checkboxes to let the user filter by the available facets, we need to pass their selections on to Bobo. 

// read facet selections from the form post
foreach(string key in Request.Form) {
    if(!key.StartsWith("facet_")) {
        continue;
    }
    var facet = key.Substring(6);
    // add a selection
    BrowseSelection sel = new BrowseSelection(facet);
    sel.AddValue(Request.Form[key]);
    browseRequest.AddSelection(sel);
}

Here we're simply:

  1. Iterating over the Form collection
  2. Picking out the facet selections and
  3. Dumping them straight into the BrowseRequest.

It really couldn't be simpler!

Filtering the news articles by category. As if you needed proof!
Filtering the news articles by category. As if you needed proof!

You can find the complete demo macro partial here.

Wrapping Up

Whilst this has been a whistle stop tour of bobo, we have built up a fully-functional search page with faceting. However, there are some subtleties that we haven't touched upon, such as how the Analyser used to build the Index has a big impact on the behaviour of the faceting (e.g. the External index uses a StandardAnalyser by default and that is the reason that the facets appear in lowercase in the screenshots above).

Hopefully I've whetted your appetite enough to want to continue reading about Bobo and/or Lucene, the technology that underpins Examine. Enjoy!

Antony Briggs

Antony is on Twitter as