@venur Been curious about this and have had some time spent on this issue in last couple of weeks. I think, i have a solution that may fit your case. It works for me in few tests that i did.
It is based on the links i shared above.
Here is what i did:
!!forward; $Whitespace = [\p{Whitespace}]; $NonWhitespace = [\P{Whitespace}]; $Letter = [\p{Letter}]; $Number = [\p{Number}]; # Default rule status is {0}=RBBI.WORD_NONE => not tokenized by ICUTokenizer $Whitespace; # Assign rule status {200}=RBBI.WORD_LETTER when the token contains a letter char # Mapped to <ALPHANUM> token type by DefaultICUTokenizerConfig $NonWhitespace* $Letter $NonWhitespace* {200}; # Assign rule status {100}=RBBI.WORD_NUM when the token contains a numeric char # Mapped to <NUM> token type by DefaultICUTokenizerConfig $NonWhitespace* $Number $NonWhitespace* {100}; # Assign rule status {1} (no RBBI equivalent) when the token contains neither a letter nor a numeric char # Mapped to <OTHER> token type by DefaultICUTokenizerConfig $NonWhitespace+ {1};
_ => ALPHA - => ALPHA $ => ALPHA ! => ALPHA
<fieldType name="text___" class="solr.TextField" positionIncrementGap="100" indexed="true" stored="false"> <analyzer> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\x{0000}.*\x{0000}" replacement=""/> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(#0;.*#0;)" replacement=""/> <tokenizer class="solr.ICUTokenizerFactory" rulefiles="Latn:Latin-break-only-on-whitespace.rbbi"/> <!-- <tokenizer class="org.apache.solr.analysis.WhitespaceTokenizerFactory" /> --> <filter class="org.apache.solr.analysis.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" splitOnNumerics="1" preserveOriginal="1" stemEnglishPossessive="1" types="characters.txt"/> <filter class="solr.ICUFoldingFilterFactory"/> </analyzer> </fieldType>
If you want to configure same settings for archite store, then follow the same steps for "$SOLR_HOME\archive\conf".
Note: You will have to full re-index in order to allow these setting handle the tokennization.
Hope this helps.
Thank you very very much @abhinavmishra14 for support, this work. We are not able to implement it so far so left it. but your solution work. We did full re-index also as you said.
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.