Lucene search and whitespaces

Question asked by sihnu on Dec 17, 2015

I have a datalist holding some data. Then I have a webscript that does lucene search against that table and returns data according to the search. Now lucene is not returning me any value if the data has whitespace in them. I can fix that by untokenizing the property holding the value used in the search but when I do that the searches against the property it becomes case sensitive. User is inserting the data to a search field and they never use uppercase letters in their search but all the data in the datalist begins with uppercase character. So this is a big issue for us. How could I make tokenized propertiues so that lucene could find them even though they have whitespaces in them (or scandinavian letters like ä and ö)? Or how could I make searches against non-tokenized properties non-case sensitive?

I have tried escaping whiespaces with \\_x0020_ and \\u00A but that doesn't work. I have also tried the query like prop:"ABC DEF" but that doesn't work either. Only way to get this working is automatically transform the first letter to uppercase with javascript but that doesn't sound like a good solution to the problem.