Custom data and Examine searching

Heads Up!

This article is several years old now, and much has happened since then, so please keep that in mind while reading it.

Oh, Hello! It's me again, you might remember me from such classics as "The worlds friendliest post on getting started with Examine", so now I'm back to bring you some "fresh meat" (Diablo pun intended) on how to extend Examine with your own custom data.

Who is this article for?

It is of course for everyone who wants to learn, but it is mainly for people who want to get custom data into a custom Examine index, and if you have tried to write a Lucene/Examine query before and is not afraid to create a class file in Visual Studio, then you will feel right at home :)

We are going to go through the following:

  1. Setting up our Index/Provider/Searcher
  2. Creating our class library
  3. Creating the indexer (here we pull down the JSON and index it)
  4. Extra goodies (a webapi controller to reindex on your demand)

So the example we are going to follow is actually a real life example, it's from a customer who supplies different kinds of courses, like if you want to become better in a special area as a carpenter, bricklayer, accountant or anything else they've got you covered. The issue is that they control their courses in an external system, but we want to make it searchable on their webpage. We will recieve data from the system as JSON.

Setup your Index/Provider/Searcher

So first step to get this snowball rollin’ is that we need to setup our indexer, provider and searcher. So starting off we need to open the following files:

  • /config/ExamineIndex.Config
  • /config/ExamineSettings.Config

/config/ExamineIndex.Config

So inside ExamineIndex.Config you should add this in between <ExamineLuceneIndexSets>

<IndexSet SetName="ExternalCourseIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/{machinename}/ExternalCourse/">
	<IndexUserFields>
		<add Name="id" />
		<add Name="name" />
		<add Name="createDate" />
		<add Name="nodeTypeAlias" />
		<add Name="urlName" />
		<add Name="teaser" />
		<add Name="hideFromSearch" />
	</IndexUserFields>
</IndexSet>

An example of Examineindex.config with custom index defined

So in the file above we just specify which fields we want in our index.

/config/ExamineSettings.Config

So inside ExamineSettings.Config we need to do 2 things, first thing is here.

Part #1 - ExamineIndexProviders/Providers

Inside the file we should go to the <Examine><ExamineIndexProviders><providers> area, in here we should add the following:

 <add name="ExternalCourseIndexer" type="Examine.LuceneEngine.Providers.SimpleDataIndexer, Examine"
dataService="myCodeProject.Indexers.DetachedCourseIndexer, myCodeProject" indexTypes="CourseContentType" />

In the provider here we need to notice 2 things, first the "type" is set to the "SimpleDataIndexer", also the dataService is set to our class, we will come to the class in a second. This is the connection from our index to our code.

Part #2 - ExamineSearchProviders/Providers

Now in the same file we've got a bit further down. Inside the file we should go to the <Examine><ExamineSearchProviders><providers> area, in here we should add the following:

<add name="ExternalCourseSearcher" type="Examine.LuceneEngine.Providers.LuceneSearcher, Examine"
analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"
enableLeadingWildcards="true"
indexSets="ExternalCourseIndexSet"/>

So what to notice here is of course that we have set our IndexSet to the IndexSet we just created.

Now we have setup our 3 core parts, lets move on to some real code.

---

So now let's spin up Visual Studio, if you don't have a Class Library for your code, go create one. I add my Class Library besides my web project (where Umbraco is installed inside) and name it myCodeProject. First you should add UmbracoCms.Core through Nuget, you do this by opening the Package manager console and write: Install-Package UmbracoCms.Core.

Inside our freshly created project you create a folder named “Indexers”, in this folder I create a class named “DetachedCourseIndexer”.

So here is what we do in our new class:

  • Inherit ISimpleDataService on our class “DetachedCourseIndexer
  • Ensure UmbracoContext
  • Get externaldata from your datasource
  • Turning your data into a SimpleDataSet
  • Put our SimpleDataSet into our index

Things to think about here is, ID, DocTypeAlias and URL, in my example here I just use a fake id, incrementing integer, and I use an Umbraco node as my URL combined with the courseId field. So I can display some course data like “/my-course?courseid=1”.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Threading.Tasks;
using System.Web;
using System.Web.Hosting;
using Examine;
using Examine.LuceneEngine;
using learnmark.Models.Website.Course;
using learnmark.Models.Website.Items;
using Newtonsoft.Json;
using umbraco;
using Umbraco.Core;
using Umbraco.Core.Logging;
using Umbraco.Web;
using Umbraco.Web.Security;

namespace myCodeProject.Indexers
{
    public class DetachedCourseIndexer : ISimpleDataService
    {
        public IEnumerable<SimpleDataSet> GetAllData(string indexType)
        {

            //Ensure that an Umbraco context is available
            if (UmbracoContext.Current == null)
            {
                var dummyContext =
                    new HttpContextWrapper(
                        new HttpContext(new SimpleWorkerRequest("/", string.Empty, new StringWriter())));
                UmbracoContext.EnsureContext(
                    dummyContext,
                    ApplicationContext.Current,
                    new WebSecurity(dummyContext, ApplicationContext.Current),
                    false);
            }

            var dataSets = new List<SimpleDataSet>();

            try

            {
                //Get data from external source
                string json = "";
                using (var webClient = new WebClient())
                {
                    webClient.Encoding = Encoding.UTF8;
                    json = webClient.DownloadString("Some url that returns course data as json");
                }

                //Parsing data to raw model 
                List<CourseRaw> coursesRaw = JsonConvert.DeserializeObject<List<CourseRaw>>(json);

                //Getting a node in umbraco to use as a url
                var coursePage = UmbracoContext.Current.ContentCache.GetById(2159); //CoursePage

                //Looping all the raw models and adding them to the dataset
                foreach (var cr in coursesRaw)
                {
                    var simpleDataSet = new SimpleDataSet { NodeDefinition = new IndexedNode(), RowData = new Dictionary<string, string>() };
                    
                    string url =coursePage.Url + "?courseid=" + cr.Id;
                    simpleDataSet = ExamineHelper.CourseRawToIndexItem(cr, simpleDataSet, indexType, url);
                    dataSets.Add(simpleDataSet);
                 
                }
            }
            catch (Exception ex)
            {
                LogHelper.Error<DetachedCourseIndexer>("error indexing:", ex);
            }

            return dataSets;
        }
    }
}

The sharp eye would say "Hey my code says that I'm missing something called "ExamineHelper" and also "CourseRaw", so lets quickly take a look at those, first of all "CourseRaw" is just a Model that I Deserialize my JSON to.

public class CourseRaw
    {
        public int Id { get; set; }
        public string Kode { get; set; }
        public string Navn { get; set; }
        public DateTime Startdato { get; set; }
        public string Url { get; set; }
        public string TilmeldingsUrl { get; set; }
        public string Type { get; set; }
        public string varighed { get; set; }
        public string Kategoristi { get; set; }
        public int Pladser { get; set; }
        public int OptagedePladser { get; set; }
        public bool Optaget { get; set; }
        public bool ErGarantikursus { get; set; }
        public object Skolefag { get; set; }
        public string Sted { get; set; }
    }

As you see nothing exciting there, but now you have the full picture. The ExamineHelper is pretty much the same, it takes our various properties and maps it to our simpleDataSet, which then becomes indexed, like this:

 public class ExamineHelper
    {

        public static SimpleDataSet CourseRawToIndexItem(CourseRaw cr, SimpleDataSet simpleDataSet, string indexType, string url)
        {
            simpleDataSet.NodeDefinition.NodeId = cr.Id;
            simpleDataSet.NodeDefinition.Type = indexType;
            simpleDataSet.RowData.Add("id", cr.Id.ToString());
            simpleDataSet.RowData.Add("kode", cr.Kode);
            simpleDataSet.RowData.Add("name", cr.Navn);
            simpleDataSet.RowData.Add("createDate", DateTime.Now.ToString("yyyy-MM-dd-HH:mm:ss"));
            simpleDataSet.RowData.Add("nodeTypeAlias", "courseRaw");
            simpleDataSet.RowData.Add("urlName", url);
            simpleDataSet.RowData.Add("teaser",cr.Sted);
            simpleDataSet.RowData.Add("hideFromSearch", "0");


            return simpleDataSet;
        }
    }

These fields are the ones I have defined in the start of my post.

And this is actually it. Now you're like, what!?! When you go into the developer section of Umbraco and hit the "Examine Management" dashboard, you should see the ExternalCourseIndexer, and if you open it and push "Rebuild index" it will now go to your datasource and index it.

But hey! that is fine and all but neither I, nor the customer wants to go in and hit Rebuild index whenever some data is updated, how can we fix that?

So the way we solved it for these type of situations is to have an UmbracoApiController that we can call every X minute/hour through either the ScheduleTask area of UmbracoSettings.config or some other service hitting it.

The ApiController could look like this:

  [JsonOnlyConfiguration]
    public class CourseApiController : UmbracoApiController
    {
        [HttpGet]
        public object Run()
        {
            try
            {
                DateTime timeSpendTime = DateTime.Now;
                var response = ExamineHelper.PostRebuildIndex("ExternalCourseIndexer");
                var TimeSpend = DateTime.Now.Subtract(timeSpendTime).TotalSeconds.ToString();

                LogHelper.Info<ContentDefaultValues>("CourseIndex msg: " + response);
                LogHelper.Info<ContentDefaultValues>("CourseIndex indexed at time: " + TimeSpend);

                return "Courses got index at " + TimeSpend + " secounds";
            }
            catch (Exception ex)
            {
                Error error = new Error("Der skete en fejl på serveren");
                LogHelper.Error<CourseApiController>(error.ToString(), ex);
                return Request.CreateResponse(JsonMetaResponse.GetError(HttpStatusCode.InternalServerError, error.Message, error));
            }
        }
    }

And to finish this off you can put this method into your ExamineHelper, it's the method that tries to rebuild your examine index.

   /// <summary>
        /// Rebuilds the index
        /// </summary>
        /// <param name="indexerName"></param>
        /// <returns></returns>
        public static string PostRebuildIndex(string indexerName)
        {
            LuceneIndexer indexer;
            string returnmsg = "";
            var msg = ValidateLuceneIndexer(indexerName, out indexer);
            if (msg)
            {
                //remove it in case there's a handler there alraedy
                indexer.IndexOperationComplete -= Indexer_IndexOperationComplete;
                //now add a single handler
                indexer.IndexOperationComplete += Indexer_IndexOperationComplete;

                var cacheKey = "temp_indexing_op_" + indexer.Name;
                //put temp val in cache which is used as a rudimentary way to know when the indexing is done
                UmbracoContext.Current.Application.ApplicationCache.RuntimeCache.InsertCacheItem(cacheKey, () => "tempValue", TimeSpan.FromMinutes(5), isSliding: false);

                try
                {
                    indexer.RebuildIndex();
                    returnmsg = indexerName + " er blevet genopbygget";
                }
                catch (Exception ex)
                {
                    //ensure it's not listening
                    indexer.IndexOperationComplete -= Indexer_IndexOperationComplete;
                    LogHelper.Error<ExamineManagementApiController>("An error occurred rebuilding index", ex);
                    returnmsg = string.Format("The index could not be rebuilt at this time, most likely there is another thread currently writing to the index. Error: {0}", ex);

                    return returnmsg;
                }
            }
            return returnmsg;
        }

I think that is it, so good luck, throw a message if something is off.

If you want to learn more check out some of these pages:

http://shazwazza.com/post/using-examine-to-index-search-with-any-data-source/
http://thecogworks.co.uk/blog/2013/02/11/examiness-hints-and-tips-from-the-trenches-part-8-custom-indexing

Rasmus Fjord

Rasmus is on Twitter as