Stemming in NLP

Stemming is the process of reducing words to their root or base form by chopping off suffixes. It helps in normalizing text and is useful for tasks like search, indexing, and text mining.

Why Use Stemming?

Popular Stemming Algorithms

10 Examples of Stemming

from nltk.stem import PorterStemmer, SnowballStemmer, LancasterStemmer

words = ["running", "runs", "runner", "easily", "fairly", "cats", "studies", "studying"]

# Example 1: Using Porter Stemmer
porter = PorterStemmer()
print([porter.stem(word) for word in words])

# Example 2: Using Snowball Stemmer (English)
snowball = SnowballStemmer("english")
print([snowball.stem(word) for word in words])

# Example 3: Using Lancaster Stemmer
lancaster = LancasterStemmer()
print([lancaster.stem(word) for word in words])

# Example 4: Comparing stemmers on a verb
word = "consolidation"
print("Porter:", porter.stem(word))
print("Snowball:", snowball.stem(word))
print("Lancaster:", lancaster.stem(word))

# Example 5: Stemming plural nouns
print(porter.stem("geese"))
print(porter.stem("wolves"))

# Example 6: Stemming compound words
print(porter.stem("multiuser"))
print(snowball.stem("multiuser"))

# Example 7: Stemming words with suffixes
print(porter.stem("happiness"))
print(porter.stem("happily"))

# Example 8: Stemming words with prefixes
print(porter.stem("unhappy"))
print(snowball.stem("unhappy"))

# Example 9: Stemming irregular verbs
print(porter.stem("went"))
print(snowball.stem("went"))

# Example 10: Stemming words with hyphens
print(porter.stem("co-operate"))
print(snowball.stem("co-operate"))