nlp - Splitting a Domain name into constituent words (if possible)? -
I want to break a domain name into component terms and numbers like
iamadomain11.com = [ 'I', 'am', 'A', 'domain', '11']
How do I do this? I know that many sets may be possible, however, I am still fine, getting a set of just 1 possibilities.
This is actually resolved in the O'Reilly Media Book, in Chapter 14, "Natural Language Corpus data ", it creates a splitter, as you would like to do in a dragon using a huge freely available token frequency data set.
Comments
Post a Comment