java - Fuzzy Queries in Lucene -


I am using Java's Lucene and indexing a table in our database based on the name of the company . After the index I want to make a fuzzy match (Levenshatin distance), on which we want to input into the database. The reason for this is that we do not want to enter enemies due to spelling errors.

For example, if I have the company name "WidgetMaker XYZ" then I do not want to include "widget maker xyz".

Of the things I have read, Lucene's Fuzzy Milan algorithm should give me a number between 0 and 1, I want to do some tests and then set a sufficient value for us to determine fixed and reasonable prices. should do.

The problem is that I am stuck, and after the search that looks everywhere on the Internet, it needs the help of the Stack Overflu Community.

Like I said I have listed the database in the name of the company, and then the following code is:

  index explorer explorer = new indicator finder (directory); New query parser (version LUCENE_30, "company", analyzer); Query fuzzy_coreia = new fuzzy value (new period ("company", "center")); I have to face the problem later, basically I do not know how to get a fuzzy match price. I know that the code should look something like, although some collectors do not look at my needs (as you can see now I am able to calculate only the number of matches which is useless for me). >   TopScoreDocCollector collector = TopScoreDocCollector.create (10, true); Explorers Search (fuzzy, collector); System.out.println ("\ ncollector.getTotalHits () =" + collector.GostelHits ());  

In addition to this, I am unable to use the class which is shown in the Lucene document. I am:

  import org.apache.lucene.queryParser. *; Do anyone know whether this is inaccessible or why am I wrong? Apology for the length of the question. 

You do not need Lusen to get a score. Take a look, it's quite easy to use . Just add jar and use it like this:

  Levenstein LD = new Levenstein (); Float SIM = ld.GetSimilarity (string1, string2); Also note that depending on the type of data (i.e. long strings, # whitespace etc.), you would like to see other algorithms such as Zero-Winkler, Smith-Waterman, etc. . 

You can use the fuzzy duplicate string to be a "master" string and then to top it in the index.


Comments

Popular posts from this blog

c# - sqlDecimal to decimal clr stored procedure Unable to cast object of type 'System.Data.SqlTypes.SqlDecimal' to type 'System.IConvertible' -

Calling GetGUIThreadInfo from Outlook VBA -

Obfuscating Python code? -