Validity of stemming
Mar. 28th, 2013 09:36 amSearch engines often reduce words so that they match variant forms. For instance, a search engine might reduce both "computer" and "computers" to "computer" so that if you search for the latter word, you also get documents that contain the former. This is called stemming.
However, if the words don't map to the same thing, this can cause problems. I'm going to use an example here which those that know me will find amusing. A naive algorithm might reduce "jesses" (as in the thing one keeps on birds) to "jess" (i.e. Jessica). If it does so, then there's no way of searching for "jesses" proper -- unless the search engine permits the user to override the stemming for particular words.
( Read more... )
However, if the words don't map to the same thing, this can cause problems. I'm going to use an example here which those that know me will find amusing. A naive algorithm might reduce "jesses" (as in the thing one keeps on birds) to "jess" (i.e. Jessica). If it does so, then there's no way of searching for "jesses" proper -- unless the search engine permits the user to override the stemming for particular words.
( Read more... )