AnsweredAssumed Answered

What is the best way to selectively re-index certain properties for an install with SOLR?

Question asked by binduwavell on Dec 13, 2013
Latest reply on Aug 4, 2014 by sunnrunner
This is hypothetical and open-ended question, we have run into this quite a few times though so is a real issue.

Imagine I have a property in my data model that has a certain indexing configuration (tokenization, etc) and for some reason we need to change that configuration in a live system. What tools techniques do we have to get the index updated without re-indexing the whole repository?

Some real-world examples (somewhat generalized):
<ul>
<li>We have a non-tokenized date property, that we now need to tokenize in order to do range searches on it.</li>
<li>We have a non-tokenized text property that has a constrained list of values. The business requirement changes and now we have to make the field free-text editable, so we need to tokenize the field.</li>
<li>We have a property that is not indexed at all, now we need to mark it for indexing</li>
<li>We don't want to index certain types of documents, the business requirements change and we need to cause those documents to be indexed.</li>
</ul>

Approaches we have considered:
<ul>
<li>Blow away the index and just re-build the whole thing.</li>
<li>Write a script that "touches" each affected node. Causing it to be fully re-indexed, unsure if we can disable auditing and versioning behaviors so these items don't get updated but still have the node re-indexed.</li>
<li>Write a script that finds the transaction ids for each affected node and write a script that uses the specialized SOLR URLs to request re-indexing of each of these transactions.</li>
<ul>
<li>Obviously we'd have to blow away the model cache and have it rebuilt with the updated model first. Unsure what issues this would cause.</li>
</ul>
<li>similar to above I think there are SOLR URLs for re-indexing individual nodes rather than full transactions, unclear if this ends up being the same thing.</li>
</ul>

What would be ideal would be to post a URL to SOLR that lists one or more properties. SOLR would then find all content with those properties and just update the index information for those properties. i.e. if I want to update a date property, I don't really want to re-index all of the document contents.

Questions, comments & suggestions greatly appreciated!

Outcomes