Share

LinkedIn

Indexing an External Database in Sitecore 7

Indexing data that does not live within the content tree can be tricky in Sitecore 7, however we walk you through steps to overcome this hurdle.

Sitecore 7 adds many improvements, with the inclusion of ContentSearch library, we are finally able to distance ourselves from the Lucene query syntax, and utilize the linq query notation to access the indexed items. Unfortunately the tradeoff is a new way of having Sitecore index any data that doesn’t reside in the content tree. This is a step by step walkthrough on how to pull an external database into the Sitecore Lucene index.

To get an index to function in Sitecore it requires 5 main components:

  • Creating and accessing a database table to index
  • Making an indexable object for the table you want to index
  • Building a crawler for the index
  • Writing a configuration file for the index
  • Processing and utilizing the index

Starting Up
First, you need to create a project for Sitecore. This entails creating a new web application, copying over the app_config folder from your Sitecore site (including the references to Sitecore.Kernel and Sitecore.ContentSearch) and setting the publishing target to the root of your website.

Accessing the Database

At this point you should also have a database to index. As shown below, I have created a simple test database.





Once the database is set up, we only really need one method to get the index up and running. In my example I created a quick EDMX and a data access class. I then access the table we’re indexing with a simple ‘get everything from my table’ method.

public class Repository
{
    public IEnumerable<Person> GetIndexablePeople()
    {
        IEnumerable<Person> indices = null;
        using (var db = new ExternalDBEntities())
        {
            indices = db.Person.ToList();
        }
        return indices;
    }
}

Indexable Objects
Once you have your data structure in place, the next thing you need is an object that the Lucene index can use to map. Lucene needs an object that inherits from IIndexable and the object mostly utilizes reflection which we will address in a moment.

public class IndexablePerson : IIndexable
{
    private Person _person;
    private IEnumerable<IndexableDataField> _fields;
 
    public IndexablePerson(Person person)
    {
        _person = person;
        _fields = _person.GetType()
                   .GetProperties(BindingFlags.Public
| BindingFlags.Instance
| BindingFlags.IgnoreCase)
                   .Select(fi => new IndexableDataField(_person, fi));
    }
 
    public string AbsolutePath
    {
        get { return "/"; }
    }
     
    public CultureInfo Culture
    {
        get { return new CultureInfo("en"); }
    }
     
    public IEnumerable<IIndexableDataField> Fields
    {
        get { return _fields; }
    }
     
    public IIndexableDataField GetFieldById(object fieldId)
    {
        return = _fields.FirstOrDefault(f => f.Id == fieldId);       
    }
     
    public IIndexableDataField GetFieldByName(string fieldName)
    {
        return _fields.FirstOrDefault(f => f.Name.ToLower() == fieldName.ToLower());
    }
     
    public IIndexableId Id
    {
        get { return new IndexableId<string>(_person.PersonId.ToString()); }
    }
     
    public void LoadAllFields()
    {
        _fields = _person.GetType()
                   .GetProperties(BindingFlags.Public
| BindingFlags.Instance
| BindingFlags.IgnoreCase)
                   .Select(fi => new IndexableDataField(_person, fi));
    }
     
    public IIndexableUniqueId UniqueId
    {
        get { return new IndexableUniqueId(Id, Guid.NewGuid()); }
    }
    public string DataSource
    {
        get { return "Person"; }
    }
}

Most of these fields aren’t needed right now, however if needed we could update the path to point to a ‘person’ page, or set the culture info. But for now, we only need to take note of the fields collection and the LoadAllFields method. Through these properties, Lucene pulls and maps the fields from your object to the index. It does this through the use of an IDataField, which is unsurprisingly little more than a field name, type and value.

public class IndexableDataField : IIndexableDataField
{
    private object _concreteObject;
    private PropertyInfo _fieldInfo;
 
    public IndexableDataField(object concreteObject, PropertyInfo fieldInfo)
    {
        this._concreteObject = concreteObject;
        this._fieldInfo = fieldInfo;
    }
 
    public Type FieldType
    {
        get{return _fieldInfo.PropertyType;}
    }
 
    public object Id
    {
        get{return _fieldInfo.Name.ToLower();}
    }
 
    public string Name
    {
        get{ return _fieldInfo.Name;}
    }
 
    public string TypeKey
    {
        get { return string.Empty; }
    }
 
    public object Value
    {
        get { return _fieldInfo.GetValue(_concreteObject); }
    }
}

Above is an example of a stripped down version of the datafield class, it’s simple but it works with our index.

Finally you will need a unique index. Below is an example that I am using for this project.

public class IndexableUniqueId : IIndexableUniqueId
    {
        private IIndexableId Id;
        private Guid guid;
 
        public IndexableUniqueId(IIndexableId Id, Guid guid)
        {
            this.Id = Id;
            this.guid = guid;
        }
        public IIndexableId GroupId
        {
            get{ return Id; }
        }
 
        public object Value
        {
            get { return guid; }
        }
    }

With these three classes in place, our foundation is set for the index. 

The Crawler

The index crawler is where the pieces of our data structure start to come together. Through the crawler we are able to provide Lucene with all the indexable objects we need. The crawler is a rather straight forward class, which grabs our objects from the database, pushes them into our indexable object and sends them down the Lucene pipeline. 

public class PeopleCrawler : FlatDataCrawler<IndexablePerson>
{
    protected override IEnumerable<IndexablePerson> GetItemsToIndex()
    {
        var people = new Repository().GetIndexablePeople().Select(p => new IndexablePerson(p));
        return people;
    }
 
    protected override IndexablePerson GetIndexable(IIndexableUniqueId indexableUniqueId)
    {
        //Get the indexable item based off the unique id
        return null;
    }
 
    protected override IndexablePerson GetIndexableAndCheckDeletes(IIndexableUniqueId indexableUniqueId)
    {
        //Get the indexable item from the database and check the version
        //Return the indexable item if the version matches what is in the unique id, if not return null
        return null;
    }
 
    protected override IEnumerable<IIndexableUniqueId> GetIndexablesToUpdateOnDelete(IIndexableUniqueId indexableUniqueId)
    {
        //Return the latest version of the item otherwise return null
        return null;
    }
 
    protected override bool IndexUpdateNeedDelete(IndexablePerson indexable)
    {
        //Set to false in SitecoreItemCrawler
        return false;
    }
}

As you can see, this crawler does nothing more than retrieve our items, and pass them into a constructor in our IndexablePerson class. The other methods would need to be fleshed out in a larger system but for our simple index where the use case doesn’t expect our data to change often, the only method we need is the GetItemsToIndex.

Configuring the Search Index

With the completion of the crawler, we have finished all the code necessary to get our index to run. We are pulling our items from the database, converting their properties to IDataFields, setting them as IIndexable objects and making them available to Lucene for indexing. Like most things in Sitecore, the index is entirely dependent on a configuration file to pull everything together and kick off the process.

I started the people index configuration by copying the Website/App_Config/Include/Sitecore.ContentManagement.Index.Master.config file, and renaming it to Product.config. You, however, can create an empty config in that folder and place this inside.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <configuration type="Sitecore.ContentSearch.LuceneProvider.LuceneSearchConfiguration, Sitecore.ContentSearch.LuceneProvider">
        <indexes hint="list:AddIndex">
          <index id="People" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
 
            <param desc="name">$(id)</param>
            <param desc="folder">$(id)</param>
            <param desc="propertyStore" ref="contentSearch/databasePropertyStore" param1="$(id)" />
 
            <configuration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
 
              <indexAllFields>true</indexAllFields>
 
              <fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch">
                <fieldNames hint="raw:AddFieldByFieldName">
                  <field fieldName="productid" storageType="YES"  indexType="TOKENIZED"   vectorType="NO" boost="1f" type="System.Int32"  settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider" />
 
                  <field fieldName="firstname" storageType="YES" indexType="TOKENIZED"    vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
                    <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
                  </field>
 
                  <field fieldName="lastname" storageType="YES" indexType="TOKENIZED"    vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
                    <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
                  </field>
 
                  <field fieldName="height" storageType="YES" indexType="TOKENIZED"    vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
                    <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
                  </field>
 
                  <field fieldName="weight"            storageType="YES" indexType="TOKENIZED"    vectorType="NO" boost="1f" type="System.Double" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
                    <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
                  </field>
 
                  <field fieldName="age" storageType="YES"  indexType="TOKENIZED"   vectorType="NO" boost="1f" type="System.Int32"  settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider" />
 
                  <field fieldName="gender" storageType="YES" indexType="TOKENIZED"    vectorType="NO" boost="1f" type="System.Char" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
                    <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
                  </field>
 
                </fieldNames>
              </fieldMap>
            </configuration>
 
            <locations hint="list:AddCrawler">
              <crawler type="ExternalDbIndex.Indexing.PeopleCrawler, ExternalDbIndex">
              </crawler>
            </locations>
          </index>
        </indexes>
      </configuration>
    </contentSearch>
  </sitecore>
</configuration>

The configuration looks daunting at the start, but let’s walk through some of the key points.

The first is that we aren’t using a custom provider because our index is a LuceneIndex, we need to include 3 parameters.

<param desc="name">$(id)</param>
<param desc="folder">$(id)</param>
<param desc="propertyStore" ref="contentSearch/databasePropertyStore" param1="$(id)" />

We now have to add a field mapping to our index configuration, each field you want to appear in the index needs to be mapped accordingly. Also, the IncludeAllFields flag allows us to only worry about the field mapping. You have the option of turning that flag off and moving to an include/exclude style, but for our example, the fields we place in our field map are the only ones we want access to.

<field fieldName="firstname" storageType="YES"
indexType="TOKENIZED" vectorType="NO"
boost="1f" type="System.String"
settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
       <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
</field>

Finally we pull in our crawler. This crawler is what sets the whole process in motion.

<locations hint="list:AddCrawler">
    <crawler type="ExternalDbIndex.Indexing.PeopleCrawler, ExternalDbIndex">
    </crawler>
</locations>

Tying it Together

Now that your configuration is set, we should be able to deploy and see our index in action. Publish your site and log into the Sitecore desktop. Once there you should access the index manager through the Start->Control Panel.

With our index in place we should now see it as one of the options in the indexing wizard.



Select your index and click the Rebuild button.

You will find out pretty quickly if it works, it will look like the figure below if it does.



If the index did not process or threw an exception, the key points to look at are the configuration field mapping and the crawler type. As a side note, linq to entities creates most fields as nullable objects.  Lucene has a hard time working with the nullable class types, and you may have to decide if it’s easier to create an intermediary class or a partial class to massage the data into a format Lucene can more easily digest.

Once your index has processed you can view it by using the Index Viewer module (on Sitecore Marketplace).

Retrieving Indexed Items
The last piece of the project is to actually utilize all your freshly indexed data. The scaffolding is in place, the index is primed and ready and now we can jump into it through the linq query syntax through the ContentSearch library!

I first create a basic user control, and add in an ASP:Gridview as seen below.

<%@ Control Language="C#" AutoEventWireup="true" CodeBehind="PeopleTable.ascx.cs" Inherits="ExternalDbIndex.Controls.PeopleTable" %>
<div>
    <h2>People</h2>
    <asp:GridView runat="server" ID="gvPeople" />
</div>

Not much to it.  Now to the code behind.

public partial class PeopleTable : System.Web.UI.UserControl
{
    protected void Page_Load(object sender, EventArgs e)
    {
        gvPeople.DataSource = GetPeople();
        gvPeople.DataBind();
    }
 
    private object GetPeople()
    {
        using (var context = ContentSearchManager.GetIndex("People").CreateSearchContext())
        {
            var query = context.GetQueryable<Person>();
            return query.ToList();
        }
    }
}

This control is as basic as it can get. However, the notable part is where inside of GetPeople we are getting our index by the name we specified in our People.config file, creating our search context and receiving a queryable list of person objects. Based on need, we could easily update our query variable to filter, sort or skip and take the number of objects we need.

The End Result
Taking our website out of the oven, we can see the table indexed and returned our 4 people.



And we know we have successfully indexed an external database in Sitecore 7!

Sitecore development, Sitecore custom code

Comments

Add a Comment

*
*

Please confirm you are human by typing the text you see in this image:

Jan Löwgren said: 10/15/2014 at 4:24 PM

Very very nice. Thank you!

shirish gadre said: 5/25/2015 at 3:51 PM

the option to manually build the index does not show after ll the steps are followed. is there something more to be done?

Cameron Palmer said: 6/12/2015 at 9:01 AM

If you can't see the index in the list of indexes in sitecore's indexing manager, that usually indicates there is something in your Index Configuration that sitecore can't find. The index Manager will pull every <index> tag and put the ID in the list.
Check the transforms, and the hierarchy of the configuration to make sure your index is in the right place. then make sure nothing in the configuration is breaking (like mismatched quotes, or unclosed xml tags). Finally, you can try duplicating one of the other indexes right underneath the default indexes, and renaming the ID. Once you confirm that works, then you can update the configuration from there.

Jorge Lusar said: 2/26/2016 at 9:47 AM

I have done a blog post on how to do a Full Text Search in the Sitecore User Manager http://blog.jorgelusar.me/post/full-text-search-sitecore-user-manager/

Zac said: 8/5/2016 at 10:23 AM

Any luck with getting this running in Sitecore 8.1? Any pointers? I get the index to pull up and run, but nothing ever gets added.

Anil Mall said: 9/6/2016 at 1:13 AM

Hi Cameron,
Thanks for this post, It works for me but the only issue is how we can perform auto index rebuild on a record entry in database table. This is needed for real time data.

Kirty Sharma said: 1/5/2017 at 2:01 AM

Hi, Nice article. we have been able to index the sql table data into sitecore index using the above approach. Can you please guide me, how we can update the sitecore index when any change is made in sql server table. I dont want to Rebuild the index every time. Is there anything through which we can update only the modified sql data to sitecore index so that it will take less time to update the indexes.