java - Fuzzy Queries in Lucene -

- July 15, 2010

I am using Java's Lucene and indexing a table in our database based on the name of the company . After the index I want to make a fuzzy match (Levenshatin distance), on which we want to input into the database. The reason for this is that we do not want to enter enemies due to spelling errors.

For example, if I have the company name "WidgetMaker XYZ" then I do not want to include "widget maker xyz".

Of the things I have read, Lucene's Fuzzy Milan algorithm should give me a number between 0 and 1, I want to do some tests and then set a sufficient value for us to determine fixed and reasonable prices. should do.

The problem is that I am stuck, and after the search that looks everywhere on the Internet, it needs the help of the Stack Overflu Community.

Like I said I have listed the database in the name of the company, and then the following code is:

  index explorer explorer = new indicator finder (directory); New query parser (version LUCENE_30, "company", analyzer); Query fuzzy_coreia = new fuzzy value (new period ("company", "center")); I have to face the problem later, basically I do not know how to get a fuzzy match price. I know that the code should look something like, although some collectors do not look at my needs (as you can see now I am able to calculate only the number of matches which is useless for me). >   TopScoreDocCollector collector = TopScoreDocCollector.create (10, true); Explorers Search (fuzzy, collector); System.out.println ("\ ncollector.getTotalHits () =" + collector.GostelHits ());

  In addition to this, I am unable to use the class which is shown in the Lucene document. I am: 
   import org.apache.lucene.queryParser. *; Do anyone know whether this is inaccessible or why am I wrong? Apology for the length of the question.

  
  You do not need Lusen to get a score. Take a look, it's quite easy to use . Just add jar and use it like this: 
   Levenstein LD = new Levenstein (); Float SIM = ld.GetSimilarity (string1, string2); Also note that depending on the type of data (i.e. long strings, # whitespace etc.), you would like to see other algorithms such as Zero-Winkler, Smith-Waterman, etc. . 
  You can use the fuzzy duplicate string to be a "master" string and then to top it in the index.




















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




paypal - How to know the URL referrer in PHP? -



-



January 15, 2013








    I'm new to integrating PayPal with sites, so easily from the service I can use Can the available method be? What am I trying to do, when the user clicks on "Return to my website", I want my return URL to know that the referrer is PayPal so I can send it automatically to the homepage . how to do this?   I tried  $ _ SERVER ['HTTP_REFERER']  but it does not seem like anything. During the IPN conversation I set the cookie and session (though I am not sure whether I am correcting it or doing this right), but on returning to my website, it does not seem to recognize any session .   If I use it by using  $ _ SERVER ['HTTP_REFERER'] , then I'm afraid it's a security risk, is not it? If so, is there any other way I remember? help please.      Should you really know the referrer? Set the PayPal Return URL to a "secret" URL which is always redirected to the homepage. If someone reaches there and gets redirected without coming from PayPal, then ...





Read more





oauth - Facebook OAuth2 Logout does not remove fb_ cookie -



-



August 15, 2014








    This is used to work so I'm not sure what's wrong the user just logs in to Facebook properly Is capable of Logging problem.   I log out the user logout by redirecting them to the php script.  $ facebook-> GetLogoutUrl ();    When users click on that link, they are logged out from the Facebook page. However, when they return to my site, I find that the FB_ cookie is still there. PHP SDK still gives user FB session details. The strange thing is that www.facebook.com shows that I am already logged out of Facebook.   What can be wrong here? Thanks, I've been stumped: (.      OK, finally, what did I do? > GetLogoutUrl ()  with a  next =  address url on my logout script which will remove the Facebook cookie by  setSession (null)  .    





Read more





wpf - Line breaks and indenting for the XAML of a saved FlowDocument? -



-



January 15, 2014








    Is there a way to format XAM generated when a flow document is saved? Currently, all of this is run simultaneously in one line, and I want to break it into my elements, with line break and indenting it makes it a little easier to read.   Here is the code I'm using to save a FlowDocument in a WPF RichTextBox as a XAML file:    // entire document around TextRange textRange Create a TextRange side = new TextRange (myRichTextBox.Document.ContentStart, myRichTextBox.Document .contentEnd); // save the file using the file (file stream fs = file. Create (filename)) {textRange.Save (fs, DataFormats.Xaml); }    Savings work fine, but as I mentioned, JPAM has been generated, all this goes together, no indentation or line breaks for its various elements Is:    & lt; Section xmlns = "http://schemas.microsoft.com/winfx/2006/xaml/presentation" xml: space = "preserve" Tekstelinement = "left" Lainhait = "auto" Hashipineshn = enabled = "false...





Read more

Search This Blog

Add s econ

java - Fuzzy Queries in Lucene -

Comments

Post a Comment

Popular posts from this blog

paypal - How to know the URL referrer in PHP? -

oauth - Facebook OAuth2 Logout does not remove fb_ cookie -

wpf - Line breaks and indenting for the XAML of a saved FlowDocument? -