java - Lucene Search with Unicode Characters -
I have indexed a database of some texts and database texts are of Unicode encoding. When I search an English term with Lucene Search, everything goes fine. But when I use a non-English queries like: "تو" It gives me the following exception:
exception in thread "main" org.apache.lucene.queryParser.ParseException: not parse can do ' ?? ':' 'or'? 'Org.apache.lucene.queryParser.ParseException: org.apache.lucene.queryParser.QueryParser.parse on Search.main (QueryParser.java:187) as a result of the first character in WildcardQuery on (Search.java:151) Not allowed: '' or '?' First org.apache.lucene.queryParser.QueryParser.getWildcardQuery (QueryParser.java:923) on org.apache.lucene.queryParser.QueryParser.Term (QueryParser.java:1347) on org.apache allowed as character WildcardQuery is not. lucene.queryParser.QueryParser.Clause (QueryParser.java:1250) org.apache.lucene.queryParser.QueryParser.Query (QueryParser.java:1178) on org.apache.lucene.queryParser.QueryParser.TopLevelQuery (on QueryParser.java: 1167) org.apache.lucene.queryParser.QueryParser.parse (QueryParser.java:182) ... more 1
what should I do?
Thank you
here two points -.
- encoding of what is sure (* .java) your srouce file that UTF8
- is other than the default encoding UTF8 Java is expected to show some . Make sure that you specify the encoding like: () New FileInputStream (filename, "UTF-8")
InputStreamReader; `
Comments
Post a Comment